WO2013112894A1

WO2013112894A1 - Methods and systems for managing patient data

Info

Publication number: WO2013112894A1
Application number: PCT/US2013/023229
Authority: WO
Inventors: Rob WYNDEN; Hari Krishna REKAPALLI
Original assignee: The Regents Of The University Of California
Priority date: 2012-01-27
Filing date: 2013-01-25
Publication date: 2013-08-01

Abstract

Systems and methods of managing data via a Semantic Graph Database are provided. Aspects of the methods may include semantically harmonizing unnormalized patient-specific data after it has already been loaded into a data warehouse. Aspects may further include constructing a retrospective record of a patient's health. Systems for use in practicing methods of the invention are also provided.

Description

METHODS AND SYSTEMS FOR MANAGING PATIENT DATA

C OSS-REFE ENCE TO RELATED APPLICATION

[0001] Pursuant to 35 U.S.C. § 119 (e), this application claims priority to the filing date of the United States Provisional Patent Application Serial No. 61/591,431, filed January 27, 2012; the disclosure of which is incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

[0002] This invention was made with government support under grant no. RR024131, awarded by the National Institutes of Health. The government has certain rights in the invention.

BACKGROUND

[0003] Hospitals and other health care providers now generate and store a wealth of data in electronic format. For example, when a patient is treated at a hospital, the hospital may store certain data about that patient' s visit, including clinical data or billing claims data. Clinical data may include, for example, information such as the particular condition(s) from which the patient was suffering or for which the patient was treated. Clinical data is often in structured format, but a large fraction may also include unstructured text information, such as a treating physician's notes. Claims data may include, for example, billing codes for the particular treatments performed by the hospital.

[0004] This electronic data is often stored via data warehousing technology. In traditional data warehousing technology, only structured data may be stored within a predefined schema. This restriction leads to complications when applying that technology to biomedical research applications, because in biomedical science the information model required is frequently not known at the time data is loaded into the warehouse. Further, tasks such as sharing data between different health care providers and aggregating data from a plurality of health care providers can be difficult, because the information may be stored in different or incompatible ways by the individual providers. SUMMARY

[0005] Systems and methods of managing data via a Semantic Graph Database

(SemanticGraphDB) are provided. Aspects of the methods include semantically harmonizing unnormalized patient- specific data after it has already been loaded into a data warehouse.

Aspects further include constructing a retrospective record of a patient's health. Systems for use in practicing methods of the present disclosure are also provided.

[0006] In certain aspects, a feature of the data to be managed by the systems and methods provided herein is that the data includes patient- specific structured clinical datum, unstructured clinical datum, or claims datum. In particular aspects, the data managed via a SemanticGraphDB includes all three types of datum. Additionally, the systems and methods provided herein may allow for the storage of one or more relations between each clinical datum stored for a patient and one or more biomedical terminologies. The data may thus be aggregated in a semantically meaningful way to enable useful analyses, such as for quality improvement and comparative effectiveness research.

[0007] Methods of the present disclosure include managing data via a

SemanticGraphDB. In certain aspects, such methods include normalizing unnormalized patient- specific clinical data stored in a plurality of data warehouses by applying biomedical

terminologies and ontologies from a terminology server, thereby producing normalized patient- specific clinical data; translating the normalized patient- specific clinical data by processing with an ontology mapping engine, wherein the ontology mapping engine receives input terminologies from the terminology server, thereby producing translated and normalized patient-specific clinical data; and loading the translated and normalized patient- specific clinical data into a graph database. Such methods may involve removing personal health information (PHI) from the unnormalized patient- specific clinical data stored in the data warehouse prior to the loading of the translated and normalized patient-specific clinical data into a graph database. The methods may further include querying the graph database after the translated and normalized patient- specific clinical data has been loaded into the graph database.

[0008] Methods of the present disclosure also include methods of mapping ICD-9 medical diagnoses to ICD-10 medical diagnoses. In certain aspects, such methods include normalizing unnormalized patient- specific clinical data stored in a data warehouse by applying biomedical terminologies and ontologies from a terminology server, thereby producing normalized patient- specific clinical data, wherein the unnormalized patient- specific clinical data includes structured clinical data, unstructured clinical data, and claims data; translating the normalized patient- specific clinical data by using an ontology mapping engine to associate ICD- 9 codes with diagnoses; loading the translated clinical data into a graph database; and mapping ICD-9 medical diagnoses to ICD- 10 medical diagnoses by applying a plurality of cross-walk algorithms to the graph database.

[0009] Methods of the present disclosure also include methods of constructing a retrospective record of a patient's health. Methods of constructing a retrospective record of a patient's health may include aggregating unnormalized patient-specific structured clinical data, claims data, and unstructured clinical data that is stored in a data warehouse; normalizing the aggregated data by applying biomedical terminologies and ontologies retrieved from a terminology server; translating the normalized clinical data using an ontology mapping engine; and loading the translated clinical data into a graph database. Such methods may include searching the graph database, such as by using a faceted search environment.

[0010] In certain instances, methods of the present disclosure may include evaluating the quality of care given to patients. Such methods may include constructing retrospective records of health of a plurality of patients by performing, for each patient, the steps of: aggregating unnormalized patient- specific structured clinical data, claims data, and unstructured clinical data that is stored in a data warehouse; normalizing the aggregated data by applying biomedical terminologies and ontologies retrieved from a terminology server; translating the normalized clinical data using an ontology mapping engine; and loading the translated clinical data into a graph database; clustering the retrospective health records based upon diagnosis and outcome; and identifying whether a patient is at risk of receiving lesser quality of care based upon one or more differences between the patient's retrospective record of health and the retrospective records of health of the plurality of patients within the same diagnosis cluster.

[0011] Such methods may include interfacing with an access control device, wherein the access control device is used at least in part to allow or deny access to particular biomedical terminologies or ontologies. Whether to allow or deny access to particular biomedical terminologies or ontologies may be based on a number of factors, including but not limited to whether such terminologies or ontologies are public or proprietary, whether the user has purchased a license permitting such access, and whether such terminologies or ontologies meet some predefined threshold of reliability.

[0012] Systems for managing data via a SemanticGraphDB are also provided. In some instances, such systems may practice methods of the present disclosure. In certain instances, systems of the present disclosure may include a graph database for storing normalized patient- specific clinical data, wherein the data includes structured clinical datum, unstructured clinical datum, or claims datum; and one or more relations between each clinical datum stored for a patient and one or more biomedical terminologies; a data warehouse which includes

unnormalized patient- specific clinical data; a terminology server which includes biomedical terminologies and ontologies; and a processor operatively coupled to the terminology server, the data warehouse and the graph database, and configured to populate the graph database with normalized patient- specific clinical data and relations by normalizing patient-specific clinical data from the data warehouse using terminologies and ontologies retrieved from the terminology server.

[0013] In other aspects, systems of the present disclosure include distributed grid medical informatics systems that implement a plurality of Semantic Graph Databases. Such a system may include a plurality of graph databases for storing normalized patient-specific clinical data, wherein the data includes structured clinical datum, unstructured clinical datum, or claims datum; and one or more relations between each clinical datum stored for a patient and one or more biomedical terminologies; a plurality of data warehouses that include unnormalized patient-specific clinical data; a terminology server that includes biomedical terminologies and ontologies; and a plurality of processors each operatively coupled to the terminology server, a data warehouse and a graph database, and configured to populate the graph database with normalized patient- specific clinical data and relations by normalizing patient-specific clinical data from the data warehouse using terminologies and ontologies retrieved from the terminology server. The system may include a search interface, which may allow a user to search the normalized patient- specific clinical data from one or more graph databases using one or more biomedical terminologies.

[0014] In certain aspects, systems are provided for semantically harmonizing

unnormalized patient- specific data after it has already been loaded into a data warehouse. Such a system may include a graph database for storing semantically harmonized patient-specific clinical data; a data warehouse that includes unnormalized patient- specific clinical data; a terminology server that includes biomedical terminologies and ontologies, the terminology server in communication with the graph database and the data warehouse; and a processor operatively coupled to the terminology server, the data warehouse and the graph database, and configured to populate the graph database with semantically harmonized patient- specific clinical data by applying biomedical terminologies and ontologies retrieved from the terminology server to the unnormalized patient- specific clinical data. In such systems, the data warehouse system may have the ability to store both structured clinical data and unstructured clinical data. The processor may be configured to populate the graph database with the relations between each patient-specific clinical datum and one or more biomedical terminologies from the terminology server.

[0015] Where desired, the normalizing performed in the systems disclosed herein may include mapping a patient-specific clinical datum with a terminology retrieved from the terminology server, and returning the mapped value. Such normalizing may include the terminology server sending an algorithm to the processor. The algorithm may then be used by the processor to normalize unnormalized patient-specific clinical data. In certain aspects, the data is thus retained locally, and is not sent to the terminology server or any other server for processing.

[0016] In certain embodiments, the normalized patient-specific clinical data may include at least two of structured clinical data, claims data, and unstructured clinical data. In particular embodiments, the normalized patient- specific clinical data includes structured clinical data, claims data, and unstructured clinical data. The normalized patient- specific clinical data may be stored in the graph database in the absence of certain information, including but not limited to personal health information (PHI).

[0017] Embodiments of the systems disclosed herein may include a search interface. The search interface may be in communication with the graph database and the terminology server, wherein the processor is configured to allow a user to search the normalized patient- specific clinical data using one or more biomedical terminologies. A variety of search interfaces may be used. For example, the search interface may be a faceted search interface. The search interface may include a time-domain query interface. The search interface may include a geographical query interface. [0018] These and other features will be apparent to the ordinarily skilled artisan upon reviewing the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] The invention may be best understood from the following detailed description when read in conjunction with the accompanying drawings. Included in the drawings are the following figures:

[0020] FIG. 1 is a flow diagram of a method of managing patient-specific data.

[0021] FIG. 2 is a flow diagram of a method of mapping ICD-9 medical diagnoses to

ICD-10 medical diagnoses.

[0022] FIG. 3 is a flow diagram of a method of evaluating the quality of care given to a patient.

[0023] FIG. 4 is a graphical illustration of a SemanticGraphDB system.

[0024] FIG. 5 is a graphical illustration of a SemanticGraphDB system that includes a search interface.

[0025] FIG. 6 is a flowchart of an example SemanticGraphDB system.

[0026] FIG. 7 is a graphical illustration of a plurality of SemanticGraphDB s, each having a processor operatively coupled to a terminology server.

[0027] FIG. 8 is a graphical illustration of a plurality of SemanticGraphDB s, each having a processor operatively coupled to a terminology server, wherein the plurality includes a search interface.

[0028] FIGS. 9-13 are graphical illustrations of example displays that may be accessed by entities associated with a SemanticGraphDB.

DETAILED DESCRIPTION

[0029] Systems and methods of managing data via a Semantic Graph Database

Aspects further include constructing a retrospective record of a patient's health. Systems for use in practicing methods of the present disclosure are also provided. [0030] Before the present invention is described in greater detail, it is to be understood that this invention is not limited to particular embodiments described, and as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

[0031] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

[0032] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, some potential and exemplary methods and materials may now be described. Any and all publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. It is understood that the present disclosure supersedes any disclosure of an incorporated publication to the extent there is a contradiction.

[0033] It must be noted that as used herein and in the appended claims, the singular forms "a", "an", and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a graph database" includes a plurality of such graph databases and reference to "the terminology server" includes reference to one or more terminology servers, and so forth. Further, a reference to "datum" may include "data," and vice versa, unless the context clearly dictates otherwise.

[0034] It is further noted that the claims may be drafted to exclude any element which may be optional. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as "solely", "only" and the like in connection with the recitation of claim elements, or the use of a "negative" limitation.

[0035] The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed. To the extent such publications may set out definitions of a term that conflict with the explicit or implicit definition of the present disclosure, the definition of the present disclosure controls.

[0036] As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.

METHODS

[0037] As summarized above, aspects of the present disclosure include managing data via a SemanticGraphDB. In certain aspects, a feature of the data to be managed by the systems and methods provided herein is that the data includes patient-specific structured clinical datum, unstructured clinical datum, or claims datum.

[0038] The phrase "clinical data" is intended to be used broadly and generically to refer to patient-related or patient-specific clinical information such as the particular condition(s) from which a patient was suffering, is suffering, or previously suffered, or for which the patient was treated, is treated, or will be treated. Accordingly, by "structured clinical datum" is meant any such data in structured form that is collected and/or stored by a hospital or other health care provider relating to a patient. By "unstructured clinical datum" is meant any clinical data in unstructured form, such as with surgical pathology reports (text reports), that is collected and/or stored by a hospital or other health care provider relating to a patient.

[0039] The phrases "billing datum," "claims datum," and "billing claims datum" may be used interchangeably herein, and are meant to be used broadly and generically to refer to any information regarding invoicing, reimbursement, billing or payment for treatment received by a patient, such as CPT codes, ICD-9 codes, billed and reimbursed dollar amounts, medication charges, LOINC codes, and the like. Claims data may include, for example, billing codes for the particular treatments performed by the hospital, specifically including treatments for which clinical data was generated or stored.

[0040] Hospitals and other health care providers may currently store such information using data warehousing technology. For example, data may be stored via nursing, laboratory, and pharmacy databases. Data may also be stored as electronic medical records, which are sometimes consolidated into an IDR (integrated data repository) or CDW (clinical data warehouse). Images are often stored in a radiology picture archival and communication system (PACS). Such data warehousing technology has a number of limitations. For example, in traditional data warehousing technology, only structured data may be stored within a predefined schema. This restriction leads to complications when applying that technology to biomedical research applications, because in biomedical science the information model required is frequently not known at the time data is loaded into the warehouse. Further, tasks such as sharing data between different health care providers and aggregating data from a plurality of health care providers can be difficult, because the information may be stored in different or incompatible ways by the individual providers.

[0041] Moreover, combining disparate types of information is difficult. If a user wanted to establish a retrospective record of the true health of patients, for example, it may be difficult to combine such structured clinical datum, unstructured clinical datum, and claims datum using existing methodologies and systems. Combining two or more such types of information may be useful for establishing a more comprehensive picture of the overall health of patients than may be accomplished using only one source of information. For example, a term such as "MS," when used within a clinical finding, may have many different meanings. If the finding were a cardiology finding, one may conclude that MS stands for "mitral stenosis." If the finding were a finding related to Anesthesia, one may assume that MS stands for "morphine sulfate." There are likely hundreds or more of such domain specific interpretations of clinical data that

exist. Additionally, multiple possible biological pathways or forms of environmental stress may cause the exact same "clinical phenotype." For example, an ITP patient may have low platelet counts due to Graves' Disease, or due to exposure to Helicobacter pylori bacteria and can sometimes be originally detected following abnormal serum liver tests and Graves patients often have liver disease. If the original clinical findings are on the topic of Graves disease or a bacterial infectious disease then within which domain should the term "ΓΓΡ" be later

interpreted? The term "ITP" does not necessarily mean the same thing within these clinical domains as it sometimes refers to "Inosine triphosphate" which is associated with gene defects leading to SAE's after liver transplants. Accordingly, by aggregating multiple sources of information about the patient, in a semantically harmonized manner, the nature of clinical information may be made easier to interpret.

Methods of Semantically Harmonizing Unnormalized Patient-Specific Data

[0042] Aspects of the methods include semantically harmonizing unnormalized patient- specific data after it has already been loaded into a data warehouse. Fig. 1 presents a flow diagram of one example method of managing patient- specific data. In the method 100, patient- specific data is first obtained from a data warehouse (101). By "data warehouse" is meant any convenient means of storing such patient- specific data of any type, including, for example, a database, an EMR, IDR, CDW, PACS, and the like. Included in this definition are entity attribute value (EAV) based warehouses, such as i2b2 and the like. A feature of such data stored in the data warehouse is that it is "unnormalized." The term "unnormalized" is used herein to describe the state of data to which biomedical terminologies and/or ontologies have not been applied. Thus, data stored in an EMR including, for example, a doctor's handwritten notes is "unnormalized" if such data is not normalized to certain biomedical terms within specific biomedical topic areas, as shall be described in greater detail below.

[0043] Once the unnormalized patient-specific data is obtained from a data warehouse

(101), it is normalized (105) by applying biomedical terminologies and/or ontologies. Such biomedical terminologies or ontologies may be obtained from a terminology server. A

"terminology server," as used herein, is used broadly and generically to describe a computer- accessible resource which provides standardized, machine-readable terminologies and

ontologies. In accord with this definition, a terminology server may, for example, be a computer physically connected to a database, a computer connected to a local network, a computer connected to a proprietary network, or a computer to which one may interface via a web portal. Any convenient means of accessing the information on the terminology server may be employed. As appreciated by one of skill in the art, the particular means of connecting to a terminology server (e.g. through an application programming interface (API)) will be dictated by the particular terminology server employed. For example, if the terminology server employed is NCBO BioPortal, the means of connecting to the terminology server may include using an API based on BioPortal REST services.

[0044] The data may be normalized (105) using the terminology server via any convenient means known in the art. Such means may include methods described by, for example, AL Rector, et al., Methods Inf Med. 1995 Mar; 34(1-2): 147-57; CG Chute, et al., Proc AMIA Symp. 1999:42-6; AM Hogarth, et al, AMIA Annu Symp Proc. 2003; 2003: 861; and PL Whetzel, et al, Nucleic Acids Res. 2011 Jul;39(Web Server issue):W541-5. Epub 2011 Jun 14; the disclosures of which are incorporated herein by reference.

[0045] The normalized data may then be translated (110) by processing with an ontology mapping engine, using the biomedical terminologies and ontologies obtained from the terminology server. A general description of ontology mapping is contained in, for example, Y. Kalfoglou and M. Schorlemmer. The Knowledge Engineering Review Journal (KER), (18(1)): 1- 31, 2003; and Wynden R, et al. Ontology Mapping and Data Discovery for the Translational Investigator. AMIA CRI Summit 2010; the disclosures of which are incorporated herein by reference. Any convenient ontology mapping engine may be employed (e.g. Snoggle, Health Ontology Mapper (HOM), X-SOM, and the like). It is thus a feature of certain embodiments of the instant methods and systems that by enabling the translation of instance data after

information has been loaded into a data warehouse, the instant systems and methods alleviate the need to translate clinical information statically, and no longer require the employment of IT development staff to translate clinical data during the warehouse loading process. For example, clinical notes (pathology and radiology findings) must first be translated into clean lexicons from all possible clinical domains of interest. The same note is translated into clean lexicons specific to radiology, pathology, ob/gyn, orthopedics, and the like, then later in the mapping process the billing and clinical encounter data is examined to determine which specific areas of interest are relevant. In this example if the customer is primarily billed as an orthopedics patient then lexicon terms relevant to orthopedics are given precedence in subsequent maps. Additional maps then normalize lab data, medications, procedures, etc. and by building on each other these mapped results can be used to map data into a computable description of the patient encounter and a computable state of the patient's health at that time. For a further discussion, see

Examples, below.

[0046] Finally, the normalized, transformed data may be loaded (115) into a graph database. The data may be augmented with one or more relations that are loaded from a terminology server. A variety of graph databases are known in the art. Any convenient graph database may be employed, such as AllegoGraph, Cytoscape, SIREn, and the like.

[0047] Means of loading the data into a graph data may vary. Such loading may include, for example, direct import. Loading may include database specific bulk import files for loading into the EAV database for subsequent data normalization with HOM and subsequent extraction into the graph database. The bulk import files are usually in database specific format, such as the bulk import file formats of Oracle, Sybase IQ, SQL Server, and the like. Data loaded into the database in this manner is unmodified and may be stored within the warehouse in the same format as it was read from the data source. Additionally, loading may include concept dimension files, which encode the location of the data within the data source as a simple hierarchical list of parent child terms relating the name of the source, the name of the table and the name of the source column from which the data was read.

[0048] In certain aspects, once loaded and translated the data can then be exported to create a data-mart that includes a subset of the translated clinical information to address a specific use case, ICD-9 mapped to specific ICD-10 codes as an automated crosswalk, for example. In the next step, data from these data marts are loaded into a graph database for later access by a query interface. Once loaded the graph database may be used to augment the translated information with relations housed within the same terminology server that was used for data translation. The combined total can then be queried using multiple user interfaces, such as Apache Solr, temporal query engines, etc.

Methods of Mapping ICD-9 Medical Diagnoses to ICD-10 Medical Diagnoses

[0049] In other aspects, methods are provided for mapping certain medical diagnoses using a first set of diagnosis codes to a second set of diagnosis codes. For example, in particular instances, methods are provided for mapping ICD-9 medical diagnoses to ICD-10 medical diagnoses. [0050] FIG. 2 presents a flow diagram of an example of such a method. In this method

200, patient-specific data is first obtained from a data warehouse (201). The unnormalized patient-specific data is then normalized (205) by applying biomedical terminologies and ontologies, as described in greater detail above. This normalized patient specific-data is then translated (210) to associate ICD-9 codes with diagnoses, using an ontology mapping engine. This step 210 includes: loading the clinical encounter (including demographics data) and map that data into standard medical terminologies such as HL7 Discharge, or the Harvard

Demographics Ontology; loading the clinical claims data and, if necessary, mapping that into standard format; and loading the unstructured clinical findings data and map that information into domain specific sections of standard medical terminologies, such as SNOMED/CT.

[0051] This normalized, translated, and mapped data may then be loaded (215) into a graph database. One or more cross-walk algorithms may then be applied to the data to map (220) the ICD-9 diagnoses to ICD-10 diagnoses. Such cross-walks may be encoded in any convenient manner, such as encoding into BioPortal as many-to-one ontology instance maps. The results generated by each cross-walk include a warehouse of patient counts associated with each new ICD-10 code. Further examples of mapping certain medical diagnoses using a first set of diagnosis codes to a second set of diagnosis codes is also described in the Examples section, below.

Methods for Evaluating the Quality of Care Given to Patients

[0052] In other aspects, methods are provided for evaluating the quality of care given to a patient, or a plurality of patients. Briefly, such methods may include normalizing and translating patient-specific data for a plurality of patients, clustering the records based upon the particular diagnosis and outcome; and comparing the 'standard' care received by such patients with the care given to a particular patient, whereby one or more differences between the care received by the particular patient may be used at least in part to identify that the patient was or is at risk of receiving lesser quality of care.

[0053] FIG. 3 presents a flow chart diagram of an example of such a method. In this method 300, records for a plurality of patients are obtained (301). This plurality of records may be originally stored in unnormalized form in a data warehouse. Such data may be normalized (305) by applying biomedical terminologies and ontologies, and translated (310) with an ontology mapping engine, such steps described in greater detail above. Once normalized and translated, this plurality of patient-specific data may then be loaded (315) into a graph database. The records may then be grouped (321) based upon diagnosis and outcome. As would be apparent to one of skill in the art, any convenient means of grouping may be employed.

[0054] To analyze whether, for a given patient, the patient is at risk of receiving lesser quality of care, certain data regarding that patient must first be obtained (302). This data may be obtained (302) from a data warehouse, in unnormalized form. The unnormalized patient- specific data may be normalized (305) by applying biomedical terminologies and ontologies, and translated (310) with an ontology mapping engine, such steps described in greater detail above. Once normalized and translated, the patient's data may then be loaded (315) into a graph database.

[0055] Identification of whether the patient was, or is, at risk of receiving lesser quality of care may be performed by identifying (322) one or more differences between the care received by the particular patient and the plurality of patients contained within the same diagnosis, who had positive outcomes. For example, using faceted query running on

SemGraphDB, it is easily possible to identify the percentage of patients diagnosed with spine problems that are at risk of being readmitted, adjusted for risks, for a revision after having been treated with a laminectomy procedure in one of the previous visits.

Access Control

[0056] In each of the methods of the present disclosure, an access control device may be employed as part of the method. Such an access control device may be used at least in part to allow or deny access to particular biomedical terminologies or ontologies. Whether to allow or deny access to particular biomedical terminologies or ontologies may be based on a number of factors, including but not limited to whether such terminologies or ontologies are public or proprietary, whether the user has purchased a license permitting such access, and whether such terminologies or ontologies meet some predefined threshold of reliability. The threshold of reliability may be determined by the user, or the community of users. That is, in certain instances, a user may choose not to use biomedical terminologies or ontologies that have not previously been used by a certain number of other users, have not been reviewed by a sufficient number of reviewers, and the like. [0057] Further, in each of the methods described, steps of searching the graph database loaded with normalized and translated patient-specific data is contemplated. Any convenient search interface may be employed, such as a faceted search environment, a time-domain query interface, and the like.

[0058] Such methods may also include a step of removing particular information from the unnormalized patient-specific data before such data is loaded into the graph database. For example, if the unnormalized patient- specific data stored in the data warehouse includes personal health information (PHI), a step of removing such PHI may be employed prior to loading the data into the graph database, so that the graph database is populated with data free of PHI. PHI may be removed by any convenient means. In certain embodiments, HIPAA protected patient identifiers may be replaced by proxy IDs. For example, proxy identifiers replace the patient's name, the name of his physician, and the patient's social security number during the data loading process. Further, if the same HIPAA protected source information is encountered from any subsequent data source the same proxy identifiers are returned. This allows the graph database to link patient data without any need to store the PHI within the database. By using this technique it is possible to build a warehouse that is a HIPAA Limited Data set that contains only limited dates of service and which has a HIPAA de-identified user interface. By populating the database with proxy identifiers, instead of HIPAA protected patient data, the patient records may be linked from multiple sources. This linkage is possible because when the same patient data is encountered, the same proxy identifier is reused, regardless of the source of the HIPAA protected information.

[0059] Further, though such methods have been described primarily in terms of a single data warehouse and a single graph database, such limitation is made solely for simplicity in explaining the methods, and is not intended to be limiting. Any practical number of data warehouses, graph databases, and terminology servers may be employed in the above methods. For example, the method 100, 200, 300 and the like may each obtain unnormalized patient- specific data from 2, 3, 4, 5, 6, 7, 8, 9, 10 or more data warehouses. No limitations are contemplated for the ownership or placement of such data warehouses: such data warehouses may be part of the same hospital or health care provider, may include information from different providers, may be spaced apart geographically, and the like. Each method may use 2, 3, 4, 5, 6, 7, 8, 9, 10 or more terminology servers. The normalized and translated patient- specific data may be loaded into 2, 3, 4, 5, 6, 7, 8, 9, 10 or more graph databases.

SYSTEMS

[0060] As summarized above, aspects of the present disclosure include managing data via a SemanticGraphDB. In some instances, such systems may practice methods of the present disclosure, described in greater detail above.

[0061] FIG. 4 presents an example system 10. The system includes a data warehouse 12, which includes unnormalized patient- specific clinical data. Such data may include, for example, structured clinical datum, unstructured clinical datum, or claims datum. Such data may include at least two of structured clinical data, claims data, and unstructured clinical data. In certain aspects, the data may include structured clinical data, claims data, and unstructured clinical data.

[0062] The data warehouse 12 is operatively coupled to a processor 14. The processor

14 is also operatively coupled to a terminology server 13, and a graph database 15. Features of terminology servers and graph databases that may be employed in systems of the present disclosure are described in greater detail above, under Methods. The processor may be configured to populate the graph database 15 with normalized patient-specific clinical data and relations by normalizing patient- specific clinical data from the data warehouse 12 using terminologies and ontologies retrieved from the terminology server 13. The graph database 15 thus stores normalized patient- specific clinical data, wherein the data includes structured clinical datum, unstructured clinical datum, or claims datum; and one or more relations between each clinical datum stored for a patient and one or more biomedical terminologies.

[0063] FIG. 5 presents the example system 10, further including a search interface 17.

The search interface 17 is in communication with the processor 14 and the terminology server 13. Further, the processor 14 is configured to allow a user to search the normalized patient- specific clinical data contained in the graph database 15 using one or more biomedical terminologies that may be contained in the terminology server 13. Any convenient type of search interface may be employed, such as a faceted search interface, a time-domain query interface, a geographical query interface, and the like.

[0064] FIG. 6 presents a non-limiting example implementation of a system according to the present disclosure. In this system, the data warehouse includes an IDR, such as that provided by the i2b2 platform. The system employs the Health Ontology Mapper (HOM) as an ontology mapping engine. The processor in communication with the IDR and HOM is operatively coupled to the NCBO BioPortal terminology server. The processor may thus populate a graph database (SIREn) with patient-specific data that has been normalized and translated by using terminologies and ontologies retrieved from the BioPortal terminology server. The graph database may be searched by using, for example, Apache Solr as a faceted search interface.

[0065] FIGS. 9-13 provide graphical illustrations of example displays that may be accessed by entities associated with a SemanticGraphDB, such as that presented in FIG. 6. FIG. 9 shows an example faceted search interface screen. This display includes hyperlinked facets for filtering. Such facets may, in some instances, display particular counts. Turning to FIG. 10, an example ontology is presented for viewing structured clinical data. This display shows how the ontology has been used to transform and filter demographics data. FIGS. 11 and 12 and 13 show how ontologies may be used to transform and filter billing data and unstructured clinical data, respectively. The ontologies of interest, orthopedics specific and demographics for example, that addresses the use case is extracted from Bioportal and presented on the Apache Solr interface for use as query parameters to be searched in patient information such as demographics, diagnosis codes, procedure codes, LOINC codes, clinical text (regardless of its size as long as the underlying hardware can accommodate it), and the annotations extracted from the text. The query results - the frequency distribution of patient counts based on certain parameters of interest, such as the disposition codes, admitting department, attending physician IDs etc., are then presented to the user as facets to assist in further refinement of the queries so the user can finally arrive the answer that he/she has been after.

[0066] Systems of the present disclosure may include a plurality of data warehouses, processors, graph databases, and the like. Turning now to FIG. 7, an example system 20 is depicted which includes two data warehouses (22 and 32), two processors (24 and 34), two graph databases (25 and 35), and a terminology server 23. In this particular system, the data warehouse 22 is operatively coupled to the processor 24. The processor 24 is also operatively coupled to a terminology server 23, and a graph database 25. The processor may be configured to populate the graph database 25 with normalized patient-specific clinical data and relations by normalizing patient-specific clinical data from the data warehouse 22 using terminologies and ontologies retrieved from the terminology server 23. The graph database 25 thus stores normalized patient- specific clinical data, wherein the data includes structured clinical datum, unstructured clinical datum, or claims datum; and one or more relations between each clinical datum stored for a patient and one or more biomedical terminologies.

[0067] Similarly, the data warehouse 32 is operatively coupled to the processor 34. The processor 34 is also operatively coupled to a terminology server 23, and a graph database 35. The processor may be configured to populate the graph database 35 with normalized patient- specific clinical data and relations by normalizing patient- specific clinical data from the data warehouse 32 using terminologies and ontologies retrieved from the terminology server 23. The graph database 35 thus stores normalized patient- specific clinical data, wherein the data includes structured clinical datum, unstructured clinical datum, or claims datum; and one or more relations between each clinical datum stored for a patient and one or more biomedical terminologies.

[0068] Thus, the data stored in graph database 25 and 35 has been normalized using terminologies and ontologies obtained from the same terminology server 23. The data stored in such graph database 25 should be semantically harmonized with that of graph database 35. Accordingly, data from graph database 25 may be readily aggregated with that of graph database 35. That is, the use of a shared terminology server 23 may allow multiple hospitals or other health care providers to translate clinical information by leveraging the same definitions of clinical terminology, the same data dictionaries representing source clinical software

environments and the same set of instance maps used to translate clinical information into standard clinical terminology.

[0069] FIG. 8 presents the example system 20, further including a search interface 27.

The search interface 27 is in communication with the processor 24, the processor 34, and the terminology server 23. Processors 24 and 34 are configured to allow a user to search the normalized patient- specific clinical data contained in the graph databases 25 and 35 using one or more biomedical terminologies that may be contained in the terminology server 23. Any convenient type of search interface may be employed, such as a faceted search interface, a time- domain query interface, a geographical query interface, and the like.

[0070] As described above, systems of the present disclosure may include a plurality of data warehouses, processors, graph databases, and the like. Any convenient number of data warehouses, processors, graph databases, and the like are contemplated. For example, a system may include 2, 3, 4, 5, 6, 7, 8, 9, 10 or more data warehouses. No limitations are contemplated for the ownership or placement of such data warehouses: such data warehouses may be part of the same hospital or health care provider, may include information from different providers, may be spaced apart geographically, and the like. A system use 2, 3, 4, 5, 6, 7, 8, 9, 10 or more terminology servers. The normalized and translated patient-specific data may be loaded into 2, 3, 4, 5, 6, 7, 8, 9, 10 or more graph databases. Likewise, a system may include 2, 3, 4, 5, 6, 7, 8, 9, 10 or more processors. In some instances, the system may be a distributed grid system.

EXAMPLES

[0071] As can be appreciated from the disclosure provided above, the present disclosure has a wide variety of applications. Accordingly, the following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Those of skill in the art will readily recognize a variety of noncritical parameters that could be changed or modified to yield essentially similar results. Thus, the following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used but some experimental errors and deviations should be accounted for.

EXAMPLE 1: MAPPING ICD-9 MEDICAL DIAGNOSES TO ICD-10 FORMAT

[0072] Currently in the United States, medical diagnoses are encoded in ICD-9 format.

Unlike in many other countries, the United States also has many payers that reimburse medical providers based on these diagnosis codes. In the year 2012, the United States will switch from the current ICD-9 format to the ICD-10 terminology for describing diagnoses. ICD-10 has over 10 times as many terms as does ICD-9, and may provide a record of patient diagnosis which is far more medically relevant than are the existing ICD-9 codes. [0073] Due to the nature of the "many payer" medical system in the United States, the transition from ICD-9 to ICD-10 could pose a public health challenge. Each year medical centers continuously negotiate and re-negotiate their reimbursement rates with payers relative to these codes. As the deadline for switching over to the ICD-10 standard approaches that negotiation process will need to be made on the new ICD-10 standard.

[0074] However, it is not currently possible for a provider to determine the distribution of

ICD-10 codes that should be expected in the coming years. Also, since there are many payers in the USA, each of which has unique state legal mandates, the remediation process followed by payers needs to follow a consistent algorithm based on the distribution of ICD-10 codes for which payments are to be made. But if the payers cannot predict the distribution ICD-10 codes used during the remediation process then the actuarial analysis generated by the payer would be inaccurate. This creates an environment that is highly unpredictable for both the providers and the payers as they negotiate their transition from ICD-9 to ICD-10.

[0075] A common medical term "cross-walk" algorithm was created to predict that future

ICD-10 distribution. This set of ontology maps negotiate the semantic gap by looking at the existing primary and secondary ICD-9 codes and using that diagnosis as context when selecting from a set of annotated clinical findings and mappings of clinical encounter and demographics data. By translating clinical data first into standard biomedical terminologies and then using the context provided by the existing set of ICD-9 codes, the cross-walk determines the equivalent ICD-10 code that would have been expected had the same encounter occurred in 2012. This includes performing a number of tasks, including: loading the clinical encounter (including demographics data) and map that data into standard medical terminologies such as HL7

Discharge, or the Harvard Demographics Ontology; loading the clinical claims data and, if necessary, mapping that into standard format; and loading the unstructured clinical findings data and map that information into domain specific sections of standard medical terminologies, such as SNOMED/CT.

[0076] Once all of the source data was loaded and mapped into standard vocabularies, the

ICD-10 terms are categorized by importance. Only those codes that are likely to be associated with the medical centers highest cost based on the recommendations of the medical center finance departments were selected. Cross walks were constructed that map from a set of standard medical terminologies representing the claims, encounter and findings data. These cross walks were encoded onto BioPortal as many-to-1 ontology instance maps. This process was repeated at several medical centers, including UCSF, Stanford, UT Houston and U Penn.

[0077] The results generated by each cross-walk include a warehouse of patient counts associated with each new ICD-10 code. Using this warehouse of retrospective data that has been mapped to ICD-10, one can then determine the likely distribution of ICD-10 codes that will be encountered subsequent to 2012.

EXAMPLE 2: SEMANTICG APHDB SYSTEM

[0078] Figure 6 depicts a flowchart of an example SemanticGraphDB system. In this particular implementation, the data warehouse includes an IDR, such as that provided by the i2b2 platform. The system employs the Health Ontology Mapper (HOM) as an ontology mapping engine. The processor in communication with the IDR and HOM is operatively coupled to the NCBO BioPortal terminology server. The processor may thus populate a graph database (SIREn) with patient- specific data that has been normalized and translated by using

terminologies and ontologies retrieved from the BioPortal terminology server. The graph database may be searched by using, for example, Apache Solr as a faceted search interface.

[0079] A feature of this particular implementation is the use of HOM as an ontology mapping engine. The HOM leverages a single terminology server to allow multiple hospitals or other health care providers to translate clinical information by leveraging the same definitions of clinical terminology, the same data dictionaries representing source clinical software

environments and the same set of instance maps used to translate clinical information into standard clinical terminology. Each instance of HOM connects to the terminology server using an API (Application Program Interface) based on BioPortal REST services. These REST services have been extended to support HOM queries for clinical instance data maps. HOM can query these services in a dynamic fashion allowing the application of instance maps to clinical data to occur after the data has already been loaded into a warehouse.

Data Loading

[0080] HOM facilitates the loading of clinical data into a warehouse to enable further analysis. The current version of HOM specifically requires the usage of i2b2 as a warehouse platform. By enabling the translation of instance data after information has been loaded into a warehouse, HOM alleviates the need to translate clinical information statically and no longer requires the employment of IT development staff to translate clinical data during the warehouse loading process. Traditional warehouse data loading is called ETL (Extract Transform Load) processing whereas HOM uses ELT (Extract Load Transform) processing using a component called HOM UETL (Universal ETL).

[0081] The HOM UETL process generates two sets of files. First it generates bulk import files for loading the warehouse. These bulk import files are a native database format supported by all database vendors and UETL currently supports the bulk import file formats of Oracle, Sybase IQ and SQL Server. Bulk import files are the fastest possible means of importing data into a warehouse. Data loaded into the warehouse in this manner is unmodified and is stored within the warehouse in the same format as it was read from the data source.

[0082] The second set of files generated by UETL is the concept dimension files. These concept dimension files encode the location of the data within the data source as a simple hierarchical list of parent child terms relating the name of the source, the name of the table and the name of the source column from which the data was read.

DataSourceName \ TableName \ ColumnName

[0083] Using this simple representation for the location from which the source data was loaded, both a concept path for the data warehouse and an URI (universal resource indicator) can be constructed for the NCBO BioPortal. Concept dimension files are generated in bulk import format and can therefore be directly loaded into the i2b2 warehouse to provide a complete concept ID for each of the facts stored. The concept dimension files are then also loaded into Protege Mapping Master, a program used to load information into NCBO BioPortal and to translate hierarchical terms into OWL (Object Web Language) format. The resulting OWL based representation of the data source is then also loaded into NCBO BioPortal.

[0084] Once the concept dimension files and raw source data have all been loaded both the i2b2 warehouse and the BioPortal reference, the same concept identifiers for source data based on their common set of concept paths. System Access Using Traditional IT Technology

[0085] HOM also includes a feature called HOM ViewCreator that provides the capability to make the results of mapped data more easily accessible by researchers. The ViewCreator can allow access to mapped data, using JDBC, from within Microsoft Excel or biostatistics packages such as SAS, STATA, SPSS and R. ViewCreator based views can also be used to build downstream databases or to load information into data mining tools such as Cognos or Business Objects.

Personal Health Information Data Handling for HIPAA Compliance

[0086] The UETL data loading process also includes methods for the removal of PHI

(personal health information) from the incoming source data. That feature is referred to as the ProxyGen Service that replaces PHI with proxy identifiers.

[0087] Specifically when the UETL process loads data from clinical sources it replaces

HIPAA protected patient identifiers with proxy IDs. For example, proxy identifiers replace the patient's name, the name of his physician, and the patient's social security number during the data loading process. Further, if the same HIPAA protected source information is encountered from any subsequent data source the same proxy identifiers are returned. This allows the data warehouse to link patient data without any need to store the PHI within the warehouse. By using this technique it is possible to build a warehouse that is a HIPAA Limited Data set that contains only limited dates of service and which has a HIPAA de-identified user interface. By supporting the ProxyGen feature HOM can greatly lower the potential legal liability of using patient data for research or other purposes such as quality improvement.

[0088] The ProxyGen service also provides the ability to de-identify any downstream database that is connected to the data warehouse. This is provided via a set of REST services (a web based application program interface) that can be called by any database that extracts information from the warehouse. The ProxyGen REST services allow PHI to be submitted from downstream databases as well as from UETL loaded source databases. Downstream databases can send PHI to the ProxyGen service and if the same PHI data are submitted the same proxy values will again be returned as those used previously during the UETL loading process. By providing this service ProxyGen not only scrubs PHI from any incoming clinical data source but it can also remove PHI from any downstream database that is connected to the data warehouse as well. The ProxyGen REST services eliminate the need to retain PHI within any warehouse or warehouse-connected database, as PHI is no longer required for record linkage.

[0089] Additionally by using the ProxyDB a report can be created that allows

investigators to contact patients. Investigators may supply of a list of proxy ID's for patients that they are interested in. If IRB (Institutional Review Board) approval to contact those patients has been provided then by accessing the ProxyDB a listing of patient contact information for those patients can be provided. This is possible because the ProxyDB contains an association between the proxy ID's and each patient's PHI.

Unstructured Text Handling During Load

[0090] The HOM UETL component also optionally contains an embedded copy of the

NCBO Annotator service for annotating unstructured text. By using Annotator clinical findings extracted from source clinical environments can be annotated with BioPortal medical

terminologies such as SNOMED/CT. The annotator feature supports named entity recognition and negation. Annotator is not a fully featured NLP (natural language processing) environment but instead is packaged as an automated annotation component used internally by HOM and only during the data loading process. When HOM runs Annotator on incoming full-text

(unstructured) data it first identifies a set of BioPortal URI' s for portions of medical

terminologies stored on BioPortal. HOM selects multiple URI's to be annotated for topic areas of interest so that the same unstructured data can be interpreted within multiple contexts. For example if HOM uses Annotator to select terms of interest in Cardiology, Orthopedic Surgery, and Pediatrics then annotations would be subsequently generated on the same unstructured text multiple times, once for each of those 3 domains. In this manner HOM UETL can select specific types of unstructured clinical findings and annotate those findings for usage within multiple domains of interest.

Instance Mapping

[0091] After the data is loaded HOM and the BioPortal can then be used to dynamically translate warehoused information by traversing maps defined on BioPortal. These maps translate information from source data format into standard medical terminologies. For example, the local hospital discharge data stored within both GE UCare as well as within EPIC can be translated into the same HL7 Discharge Disposition format. Subsequent maps that utilize discharge disposition can then reference the standard HL7 Discharge format. After a map has run the same data exists within the warehouse in both its raw untranslated form and in one or more translated standard medical terminologies. Additional mappings of the same source data can then be added at any time in the future without any need to reload the source data.

[0092] The HOM Interpreter dynamically translates local clinical instance data by communicating with the BioPortal REST services API (application program interface). This translation into standard ontologies happens when requested by the researcher and after the data has already been loaded into the warehouse.

[0093] The instance maps stored on BioPortal can define three difference classes of clinical instance data maps, including 1-to-l maps; many-to-1 maps; and automatic maps (many- to-many). The HOM 1-to-l maps will translate a single term within the value set of the source data system into a single term for the value set of the target medical terminology. The HOM many-to-1 maps will look for the presence of multiple value set terms from the source data and translate that information into a single target terminology term. These 1-to-l and many-to-1 maps are defined using Protege and the BioPortal web interface.

[0094] Automatic maps allow a terminologist to check-in algorithms that execute on source data to determine the target terms. Examples of these "auto maps" include the normalization of clinical lab data into bins of "Low", "Low-Normal", "Normal", "High-Normal" and "High". Automatic maps can also include calls to third party terminology servers such as RxNav and may contain biostatistical programs or calls to machine learning libraries.

[0095] The above-mentioned HOM architecture has been implemented repeatedly at multiple institutions. Most recently it was used to implement the CELDAC (Comparative Effectiveness Large Dataset Analytics Core) grant that provided a warehoused and mapped form of the State of California OSHPD database for the rapid analysis of data collected for public health research.

[0096] At a systems level HOM has several features that may enable it to be used in the methods and systems provided herein. First, HOM's approach is to access medical terminology real-time on terminology servers. There are no XML files deployed which may cause synchronization issues. Second, since access to terminology servers is always over web-based REST services and referenced by URI it is possible to use URI's to access portions of medical terminology that constitute subsets. These subsets of terms can then be used to define terms within very specific domains for the context specific interpretation of biomedical data. Third, HOM allows the same data to be interpreted and re-interpreted many times and within different contexts. Fourth, HOM processing is usually configured as a background batch process at off- peak processing times. It maps local instance data into standard terminologies off-line so that once mapped the queries for those translated forms run quickly. Fifth, terminology servers such as NCBO BioPortal can be viewed as a kind of content management system for medical terms. As such multiple terminologists can collaborate on the terminology server to more efficiently define terminology used for multiple projects simultaneously. Sixth, since all HOM based instance-mapping sites access the same terminology server, medical terminologies and instance maps can be re-used at multiple sites further increasing the efficiency of their usage. Finally, HOM supports 3 levels of complexity for instance maps, 1-to-l maps, many- to- 1 maps and auto- maps. This view of the mapping process sends the algorithm that describes how to handle data to the clinical data for processing. HOM does not require the sending of clinical data to the algorithm for processing such as in cloud-based analytics based approaches.

[0097] Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the teachings of this disclosure that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

[0098] Accordingly, the preceding merely illustrates the principles of the invention. It will be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope. Furthermore, all examples and conditional language recited herein are principally intended to aid the reader in understanding the principles of the invention and the concepts contributed by the inventors to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents and equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. The scope of the present invention, therefore, is not intended to be limited to the exemplary embodiments shown and described herein. Rather, the scope and spirit of present invention is embodied by the appended claims.

Claims

CLAIMS What Is Claimed Is:

1. A medical informatics system that implements a Semantic Graph Database

(SemanticGraphDB), the system comprising:

a graph database for storing:

(a) normalized patient- specific clinical data, the data comprising

structured clinical datum, unstructured clinical datum, or claims datum; and

(b) one or more relations between each clinical datum stored for a patient and one or more biomedical terminologies;

a data warehouse comprising unnormalized patient-specific clinical data; a terminology server comprising biomedical terminologies and ontologies; and a processor operatively coupled to the terminology server, the data warehouse and the graph database, and configured to populate the graph database with normalized patient- specific clinical data and relations by normalizing patient- specific clinical data from the data warehouse using terminologies and ontologies retrieved from the terminology server.

2. The system of claim 1, wherein the normalizing comprises mapping a patient- specific clinical datum with a terminology retrieved from the terminology server, and returning the mapped value.

3. The system of claim 1 or 2, wherein the normalizing comprises the terminology server sending an algorithm to the processor, the algorithm used by the processor to normalize unnormalized patient- specific clinical data.

4. The system of any of claims 1-3, wherein the normalized patient- specific clinical data comprises at least two of structured clinical data, claims data, and unstructured clinical data.

5. The system of any of claims 1-4, wherein the normalized patient- specific clinical data comprises structured clinical data, claims data, and unstructured clinical data.

6. The system of any of claims 1-5, wherein the normalized patient- specific clinical data is stored in the graph database in the absence of personal health information (PHI).

7. The system of any of claims 1-6, comprising a search interface in communication with the graph database and the terminology server, wherein the processor is further configured to allow a user to search the normalized patient- specific clinical data using one or more biomedical terminologies.

8. The system of claim 7, wherein the search interface is a faceted search interface.

9. The system of claim 7 or 8, wherein the search interface comprises a time-domain query interface.

10. The system of any of claims 7-9, wherein the search interface comprises a geographical query interface.

11. A distributed grid medical informatics system that implements a plurality of Semantic Graph Databases (SemanticGraphDB), the system comprising:

a plurality of graph databases for storing:

(a) normalized patient- specific clinical data, the data comprising

structured clinical datum, unstructured clinical datum, or claims datum; and

a plurality of data warehouses comprising unnormalized patient- specific clinical data; a terminology server comprising biomedical terminologies and ontologies; and a plurality of processors each operatively coupled to the terminology server, a data warehouse and a graph database, and configured to populate the graph database with normalized patient-specific clinical data and relations by normalizing patient-specific clinical data from the data warehouse using terminologies and ontologies retrieved from the terminology server.

12. The system of claim 11, further comprising a search interface, wherein the search

interface allows a user to search the normalized patient-specific clinical data from one or more graph databases using one or more biomedical terminologies.

13. A system for semantically harmonizing unnormalized patient-specific data after it has already been loaded into a data warehouse, the system comprising:

a graph database for storing semantically harmonized patient-specific clinical data; a data warehouse comprising unnormalized patient-specific clinical data;

a terminology server comprising biomedical terminologies and ontologies, the terminology server in communication with the graph database and the data warehouse; and a processor operatively coupled to the terminology server, the data warehouse and the graph database, and configured to populate the graph database with semantically harmonized patient-specific clinical data by applying biomedical terminologies and ontologies retrieved from the terminology server to the unnormalized patient-specific clinical data.

14. The system of claim 13, wherein the data warehouse system has the ability to store both structured clinical data and unstructured clinical data.

15. The system of claim 13 or 14, wherein the semantically harmonized patient-specific clinical data stored in the graph database comprises structured clinical data, unstructured clinical data, and claims data.

16. The system of claim 13, wherein the processor is further configured to populate the graph database with the relations between each patient-specific clinical datum and one or more biomedical terminologies from the terminology server.

17. A method of managing data via a Semantic Graph Database, the method comprising: normalizing unnormalized patient- specific clinical data stored in a plurality of data warehouses by applying biomedical terminologies and ontologies from a terminology server, thereby producing normalized patient- specific clinical data;

translating the normalized patient- specific clinical data by processing with an ontology mapping engine, wherein the ontology mapping engine receives input terminologies from the terminology server, thereby producing translated and normalized patient-specific clinical data; and

loading the translated and normalized patient-specific clinical data into a graph database.

18. The method of claim 17, further comprising querying the graph database after the

translated and normalized patient-specific clinical data has been loaded into the graph database.

19. The method according to claim 17 or 18, further comprising removing PHI from the unnormalized patient- specific clinical data stored in the data warehouse prior to the loading of the translated and normalized patient-specific clinical data into a graph database.

20. A method of mapping ICD-9 medical diagnoses to ICD-10 medical diagnoses, the

method comprising:

normalizing unnormalized patient- specific clinical data stored in a data warehouse by applying biomedical terminologies and ontologies from a terminology server, thereby producing normalized patient-specific clinical data, wherein the unnormalized patient- specific clinical data comprises structured clinical data, unstructured clinical data, and claims data;

translating the normalized patient- specific clinical data by using an ontology mapping engine to associate ICD-9 codes with diagnoses;

loading the translated clinical data into a graph database; and

mapping ICD-9 medical diagnoses to ICD-10 medical diagnoses by applying a plurality of cross-walk algorithms to the graphical database.

21. A method for constructing a retrospective record of a patient's health, the method comprising:

aggregating unnormalized patient-specific structured clinical data, claims data, and unstructured clinical data that is stored in a data warehouse;

normalizing the aggregated data by applying biomedical terminologies and ontologies retrieved from a terminology server;

translating the normalized clinical data using an ontology mapping engine; and loading the translated clinical data into a graph database.

22. The method of claim 21, further comprising searching the graph database using a faceted search environment.

23. A method for evaluating the quality of care given to patients, the method comprising: constructing retrospective records of health of a plurality of patients by performing, for each patient, the steps of:

translating the normalized clinical data using an ontology mapping engine; and loading the translated clinical data into a graph database;

clustering the retrospective health records based upon diagnosis and outcome; and identifying whether a patient is at risk of receiving lesser quality of care based upon one or more differences between the patient's retrospective record of health and the retrospective records of health of the plurality of patients within the same diagnosis cluster.

24. The method of claim 23, further comprising an access control device, wherein the access control device is used at least in part to allow or deny access to particular biomedical terminologies and ontologies.