CN113742348A - Patient data matching method in CDR system, main index establishing method and device - Google Patents

Patient data matching method in CDR system, main index establishing method and device Download PDF

Info

Publication number
CN113742348A
CN113742348A CN202111045885.3A CN202111045885A CN113742348A CN 113742348 A CN113742348 A CN 113742348A CN 202111045885 A CN202111045885 A CN 202111045885A CN 113742348 A CN113742348 A CN 113742348A
Authority
CN
China
Prior art keywords
data
matching
confirmed
ith
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111045885.3A
Other languages
Chinese (zh)
Other versions
CN113742348B (en
Inventor
刘新辉
张勇斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Clinbrain Information Technology Co Ltd
Original Assignee
Shanghai Clinbrain Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Clinbrain Information Technology Co Ltd filed Critical Shanghai Clinbrain Information Technology Co Ltd
Priority to CN202111045885.3A priority Critical patent/CN113742348B/en
Publication of CN113742348A publication Critical patent/CN113742348A/en
Application granted granted Critical
Publication of CN113742348B publication Critical patent/CN113742348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a patient data matching method, a main index establishing method and a main index establishing device in a CDR system. Wherein the patient data matching method in the CDR system comprises the following steps: acquiring data to be matched and confirmed data; sequentially acquiring at least two similarities of each confirmed data based on at least two combinations of the matching fields; and judging whether the matching is successful or not based on all the similarity, and obtaining the confirmed data matched with the data to be matched. Based on the patient data matching method in the CDR system, the main index of the patient can be constructed, all historical diagnosis records of the patient can be further obtained, the diagnosis of the patient condition and medical research can be assisted, and the problem that uniform patient identification does not exist among all business systems of a hospital in the prior art is solved. On the other hand, the effectiveness of the matching result is increased by using a multi-turn matching mode, and the complex data working conditions can be met.

Description

Patient data matching method in CDR system, main index establishing method and device
Technical Field
The invention relates to the technical field of data processing, in particular to a patient data matching method in a CDR system, a main index establishing method and a device.
Background
At present, a plurality of information systems are arranged in a hospital, patient identifications among the information systems are inconsistent, association and cross indexing cannot be carried out to obtain other related information, an information island is easily formed, the maximum utilization of medical data resources cannot be realized, and the consistency and the integrity of the patient information of each system are poor.
In summary, in the prior art, there is a problem that there is no uniform patient identification between the business systems.
Disclosure of Invention
The invention aims to provide a patient data matching method, a main index establishing method and a device in a CDR system, and aims to solve the problem that uniform patient identification does not exist among all business systems of a hospital in the prior art.
In order to solve the above technical problem, according to a first aspect of the present invention, there is provided a patient data matching method in a CDR system, including:
acquiring data to be matched and confirmed data, wherein the data to be matched comprises a matching field, and the confirmed data comprises the matching field;
sequentially acquiring the ith similarity of the data to be matched and each confirmed data based on the ith combination of the matching fields;
judging whether the matching is successful or not based on all the ith similarity, and if so, obtaining a piece of confirmed data matched with the data to be matched based on all the ith similarity;
wherein, the value range of i is all integers from 1 to n, and n is an integer larger than 1.
Optionally, the step of determining whether the matching is successful based on all the ith similarity includes: if the ith similarity corresponding to each piece of confirmed data is smaller than or equal to the ith threshold, the matching is failed; otherwise, the matching is successful;
or,
the step of judging whether the matching is successful or not based on all the ith similarity comprises the following steps: if the sum of all the ith similarity corresponding to each piece of confirmed data is smaller than a preset threshold value, the matching fails; otherwise, the matching is successful.
Optionally, the step of obtaining a piece of confirmed data matched with the data to be matched based on all the ith similarities includes: selecting the confirmed data with the largest sum of all the ith similarities.
Optionally, the step of obtaining a piece of confirmed data matched with the data to be matched based on all the ith similarities includes:
if the ith similarity of at least one confirmed data in the ith set is greater than the ith threshold and i is less than n, forming an ith +1 set by the confirmed data with the ith similarity greater than the ith threshold in the ith set and judging again;
otherwise, selecting the confirmed data with the largest sum of all the ith similarity in the ith set, or selecting the confirmed data with the largest ith similarity in the ith set;
wherein the 1 st set is all the confirmed data.
Optionally, the step of obtaining the ith similarity between the data to be matched and the confirmed data includes:
and sequentially obtaining a similarity value corresponding to each matching field in the ith combination, wherein the similarity value is weighted and averaged based on the ith weighting parameter to obtain the ith similarity.
Optionally, each matching field in the data to be matched only stores one attribute value, and each matching field in the confirmed data stores one or more than two attribute values; the step of obtaining the similar value corresponding to the matching field comprises:
and performing similarity calculation on the attribute values in the data to be matched and each attribute value in the corresponding matching field in the confirmed data, and obtaining the similarity value after weighted average of calculation results.
Optionally, the 1 st combination includes a name field, a gender field, and an identification number field, and the 1 st weighting parameter corresponding to the identification number field is greater than 0.5.
Optionally, the matching field includes a name field, and the method for obtaining the similar value corresponding to the name field includes: calculated according to the following formula:
Figure BDA0003251168520000021
wherein similarity represents the similarity value, EDABRepresents the edit distance between A and B, max () represents the maximum operation, LAString length, L, for ABAnd B represents the length of a character string of B, A represents the attribute value stored in the name field in the data to be matched, and B represents the attribute value stored in the name field in the confirmed data.
In order to solve the above technical problem, according to a second aspect of the present invention, there is provided a patient master index establishing method in a CDR system, including:
acquiring original data from at least two service systems, wherein the original data comprises matching fields;
the raw data is classified into first data and second data based on a cleaning rule;
the first data generates confirmed data based on a merging rule, and the confirmed data comprises a matching field and a main index field;
the second data is configured to be matched data, and the matched data obtains a matching result based on the patient data matching method in the CDR system;
if the matching is successful, combining the current data to be matched and the matched confirmed data;
and if the matching fails, generating temporary index data by the current data to be matched.
Optionally, the matching field includes a name field and an identity card field, and the merge rule includes:
judging whether the identity card fields of the first data and one piece of confirmed data are equal, and judging that the name fields of the first data and the confirmed data are equal;
if the first data is equal to the identity card field of one piece of confirmed data and the current first data is equal to the name field of the current piece of confirmed data; merging the current first data and the current confirmed data;
otherwise, the current first data is independently converted into a new confirmed data.
Optionally, each matching field in the first data stores only one attribute value, each matching field in the confirmed data stores one or more than two attribute values, and the step of determining whether the matching fields of the first data and the confirmed data are equal includes:
if the attribute value of the matching field of the first data is a null value, judging that the attribute values are not equal;
if the attribute value of the matching field of the first data is not a null value and the attribute value of the matching field of the first data is not equal to the attribute value of the matching field of the confirmed data, judging that the attribute values are not equal;
and if the attribute value of the matching field of the first data is not a null value and the attribute value of the matching field of the first data is equal to one of the attribute values of the matching field of the confirmed data, judging that the attribute values are equal.
Optionally, the step of merging the data to be merged and the confirmed data includes:
the data to be merged and the confirmed data are sequentially judged about each matching field, and if the attribute value of the matching field of the data to be merged is not a null value and the attribute value of the matching field of the data to be merged is not equal to the attribute value of the matching field of the confirmed data, the current attribute value of the data to be merged is stored in the matching field of the confirmed data;
wherein the data to be merged comprises the first data and the data to be matched.
In order to solve the technical problem, according to a third aspect of the present invention, a patient master index creating apparatus is provided, which includes a matching module for executing the data matching method.
Compared with the prior art, in the patient data matching method, the main index establishing method and the device in the CDR system provided by the invention, the patient data matching method in the CDR system comprises the following steps: acquiring data to be matched and confirmed data; sequentially acquiring at least two similarities of each confirmed data based on at least two combinations of the matching fields; and judging whether the matching is successful or not based on all the similarity, and obtaining the confirmed data matched with the data to be matched. Based on the patient data matching method in the CDR system, the main index of the patient can be constructed, all historical diagnosis records of the patient can be further obtained, the diagnosis of the patient condition and medical research can be assisted, and the problem that uniform patient identification does not exist among all business systems of a hospital in the prior art is solved. On the other hand, the effectiveness of the matching result is increased by using a multi-turn matching mode, and the complex data working conditions can be met.
Drawings
It will be appreciated by those skilled in the art that the drawings are provided for a better understanding of the invention and do not constitute any limitation to the scope of the invention. Wherein:
FIG. 1 is a schematic flow chart of a patient data matching method in a CDR system according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a patient primary index establishing method in the CDR system according to an embodiment of the present invention.
Detailed Description
To further clarify the objects, advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is to be noted that the drawings are in greatly simplified form and are not to scale, but are merely intended to facilitate and clarify the explanation of the embodiments of the present invention. Further, the structures illustrated in the drawings are often part of actual structures. In particular, the drawings may have different emphasis points and may sometimes be scaled differently.
As used in this application, the singular forms "a", "an" and "the" include plural referents, the term "or" is generally employed in a sense including "and/or," the terms "a" and "an" are generally employed in a sense including "at least one," the terms "at least two" are generally employed in a sense including "two or more," and the terms "first", "second" and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit to the number of technical features indicated. Thus, features defined as "first", "second" and "third" may explicitly or implicitly include one or at least two of the features, "one end" and "the other end" and "proximal end" and "distal end" generally refer to the corresponding two parts, which include not only the end points, but also the terms "mounted", "connected" and "connected" should be understood broadly, e.g., as a fixed connection, as a detachable connection, or as an integral part; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. Furthermore, as used in the present invention, the disposition of an element with another element generally only means that there is a connection, coupling, fit or driving relationship between the two elements, and the connection, coupling, fit or driving relationship between the two elements may be direct or indirect through intermediate elements, and cannot be understood as indicating or implying any spatial positional relationship between the two elements, i.e., an element may be in any orientation inside, outside, above, below or to one side of another element, unless the content clearly indicates otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The core idea of the invention is to provide a patient Data matching method, a main index establishing method and a device in a CDR (Clinical Data retrieval) system, so as to solve the problem that in the prior art, uniform patient identification does not exist among all business systems of a hospital.
The following description refers to the accompanying drawings.
Referring to fig. 1 to fig. 2, fig. 1 is a schematic flow chart illustrating a patient data matching method in a CDR system according to an embodiment of the present invention; fig. 2 is a flowchart illustrating a patient primary index establishing method in the CDR system according to an embodiment of the present invention.
As shown in fig. 1, the present embodiment provides a patient data matching method in a CDR system, including:
s100, acquiring data to be matched and confirmed data, wherein the data to be matched comprises a matching field, and the confirmed data comprises the matching field;
s200, sequentially obtaining the ith similarity of the data to be matched and each confirmed data based on the ith combination of the matching fields;
s300, judging whether the matching is successful or not based on all the ith similarity, and if so, obtaining a piece of confirmed data matched with the data to be matched based on all the ith similarity;
wherein, the value of i is all integers from 1 to n, and n is an integer larger than 1.
In step S100, the inclusion of the matching field should be understood in the following way. Assuming that the field name of one of the matching fields is "F", wherein the attribute value of the "F" field of a certain piece of data (specifically, the data to be matched or the confirmed data, which is referred to herein for convenience of description, simply as data) is null; based on different specifications or standards, an "F: NULL "or" F: (ii) a "may not contain a string representing the" F "field at all. However, as long as a piece of data includes a character string representing the "F" field in all data processed by the method, or an operation method related to the "F" field is included in a method for parsing, reading or post-processing the data, it should be understood that the piece of data includes the "F" field.
In step S200, a total of n calculations are actually performed, the ith calculation uses the ith combination, and the ith similarity is obtained. In step S200, the k1 th calculation and the k2 th calculation are different in that the k1 th combination and the k2 th combination are different and/or the calculation manner is different. Wherein k1 ≠ k2, k1 is all integers from 1 to n, and k2 is all integers from 1 to n.
In step S300, it is determined whether the matching is successful or not, or a scheme of obtaining the matching data, which can be understood by referring to the subsequent content of this embodiment, and a person skilled in the art can also modify the specific scheme provided in this embodiment, which should also be understood as a protection scope of the technical scheme of the present invention.
With the configuration, through multiple rounds of fuzzy matching, the data to be matched can find the closest matching data, and the establishment of a subsequent global index value is facilitated. The method is a core method for solving the problem that uniform patient identification does not exist among all business systems of a hospital.
In an embodiment, the step of determining whether the matching is successful based on all the ith similarities includes: if the ith similarity corresponding to each piece of confirmed data is less than or equal to an ith threshold (which should be understood here as meaning that 1 to n are both true for i), the matching fails; otherwise, the matching is successful.
Assume that the confirmed data are 3 pieces and are numbered 1, 2, and 3, respectively. The value of n is 3, the 1 st threshold is 0.9, the 2 nd threshold is 0.9, and the 3 rd threshold is 0.9. The similarity between the data to be matched and the confirmed data is shown in table 1.
TABLE 1 similarity of the matching data
Confirmed data numbering Degree of similarity 1 Degree of similarity 2 3 rd degree of similarity
1 0.8 0.7 0.6
2 0.6 0.7 0.8
3 0.3 0.2 0.3
Since the ith similarity of each of the confirmed data in table 1 is smaller than the ith threshold, the matching is considered to be failed.
In another embodiment, the step of determining whether the matching is successful based on all the ith similarities comprises: if the sum of all the ith similarity corresponding to each piece of confirmed data is smaller than a preset threshold value, the matching fails; otherwise, the matching is successful.
Assume that the confirmed data are 3 pieces and are numbered 1, 2, and 3, respectively. The value of n is 3, and the preset threshold value is 2.7. The similarity between the data to be matched and the confirmed data is shown in table 2.
TABLE 2 similarity of the matching data
Confirmed data numbering Degree of similarity 1 Degree of similarity 2 3 rd degree of similarity
1 0.8 0.7 0.6
2 0.6 0.7 0.8
3 0.3 0.2 0.3
Since the sum of all similarities of each of the confirmed data in table 2 is less than 2.7, the matching is considered to fail.
In an embodiment, the step of obtaining a piece of confirmed data matching the data to be matched based on all the ith similarities includes: selecting the confirmed data with the largest sum of all the ith similarities.
Assume that the confirmed data are 3 pieces and are numbered 1, 2, and 3, respectively. The value of n is 3. The similarity between the data to be matched and the confirmed data is shown in table 3.
TABLE 3 similarity of the matching data
Confirmed data numbering Degree of similarity 1 Degree of similarity 2 3 rd degree of similarity
1 0.8 0.7 0.8
2 0.6 0.7 0.8
3 0.3 0.2 0.3
Since the sum of all the similarities of the confirmed data numbered 1 in table 3 is 2.5, which is the maximum value, the 1 st piece of the confirmed data is selected as the matching data.
It should be understood that although the sum of all the similarity degrees of the 1 st data in table 3 is smaller than the preset threshold 2.7 in the previous example, in the present embodiment, it is not limited what the condition of the failure of the judgment is. In this embodiment, a scheme that the sum is greater than 1, i.e., the matching is considered to be successful, a scheme that the 1 st threshold to the 3 rd threshold are all 0.5, and other possible schemes may be selected. The examples herein are merely illustrative of selection criteria for matching data, and are not judgment criteria for whether matching is successful or not.
It is to be understood that when there are at least two pieces of data that meet the condition (e.g., the maximum of two pieces of data is equal), the data that has the highest similarity 1 is selected according to an additional rule, for example, the data that has the highest similarity 1 is selected, the data that has been randomly selected, the data that has the earliest creation time is selected, or another rule of comprehensive judgment is selected. In most cases, there is no case where there is exactly more than one piece of the confirmed data having the largest sum of all the ith similarities, and the additional rule is set only for preventing program errors, so that it can be set as a simpler rule. The description similar to the logic of this paragraph in the subsequent content of this specification can be understood in light of the idea of this paragraph.
In another embodiment, the step of obtaining a piece of confirmed data matching the data to be matched based on all the ith similarities includes:
s301, if the ith similarity of at least one confirmed data in the ith set is greater than the ith threshold and i is less than n, forming an ith +1 set by the confirmed data with the ith similarity greater than the ith threshold in the ith set and judging again;
s302, if not, selecting the confirmed data with the maximum sum of all the ith similarity in the ith set;
wherein the 1 st set is all the confirmed data.
Assume that the confirmed data are 5 pieces and are numbered 1, 2, 3, 4, 5, respectively. The value of n is 3, the 1 st threshold is 0.6, the 2 nd threshold is 0.75, and the 3 rd threshold is 0.65. The similarity between the data to be matched and the confirmed data is shown in table 4.
TABLE 4 similarity of the matching data
Figure BDA0003251168520000081
Figure BDA0003251168520000091
In round 1, since the 1 st similarity of the confirmed data of the number 3 is 0.6, the confirmed data of the number 1, 2, 4, 5 in the 2 nd set and the confirmed data of the number 4, 5 in the 3 rd set are equal to n in the case of the 3 rd set, i at this time is equal to n, and therefore, the data of the number 4 is selected by "selecting the confirmed data having the largest sum of all the i-th similarities in the i-th set". As can be seen from the example herein, although the sum of all the similarities of the data numbered 3 is the largest, the data is not the final matched data.
In another example, the 2 nd threshold is 0.8, and other conditions are exactly the same as in the previous example. At this time, there is no data greater than the 2 nd threshold in the 2 nd set, and therefore, data with the largest sum of all similarities in the 2 nd set, that is, data numbered 1, is selected.
The core idea of the above logic is to select according to a mechanism similar to a knockout competition, and if the similarity of a certain piece of the confirmed data in a certain round is low, the confirmed data is excluded from the candidate list.
In another embodiment, the step of obtaining a piece of confirmed data matching the data to be matched based on all the ith similarities includes:
s301, if the ith similarity of at least one confirmed data in the ith set is greater than the ith threshold and i is less than n, forming an ith +1 set by the confirmed data with the ith similarity greater than the ith threshold in the ith set and judging again;
s302, if not, selecting the confirmed data with the maximum ith similarity in the ith set;
wherein the 1 st set is all the confirmed data.
The main idea of the above embodiment is substantially the same as that of the previous embodiment, except that the data with the largest ith similarity is selected last, and the specific implementation process of the data can be understood with reference to the previous embodiment.
Further, the step of obtaining the ith similarity between the data to be matched and the confirmed data includes:
and sequentially obtaining a similarity value corresponding to each matching field in the ith combination, wherein the similarity value is weighted and averaged based on the ith weighting parameter to obtain the ith similarity.
For example, the ith combination includes the matching fields "C", "D", and "E", where the ith weighting parameter of "C" is 0.2, the ith weighting parameter of "D" is 0.3, the ith weighting parameter of "E" is 0.5, the similarity value corresponding to "C" is 0.7, the similarity value corresponding to "D" is 0.5, and the similarity value corresponding to "E" is 0.8, and the final similarity calculation result is 0.2 × 0.7+0.3 +0.5 × 0.8 — 0.69.
It should be understood that when i takes different values, the ith weighting parameter corresponding to the same matching field may be different.
Further, each matching field in the data to be matched only stores one attribute value, and each matching field in the confirmed data stores one or more than two attribute values; the step of obtaining the similar value corresponding to the matching field comprises:
and performing similarity calculation on the attribute values in the data to be matched and each attribute value in the corresponding matching field in the confirmed data, and obtaining the similarity value after weighted average of calculation results.
When the matching fields are "C", "D", and "E", one possible form of the data to be matched is shown in table 5.
TABLE 5 exemplary forms of data to be matched
Name of field C D E
Attribute value 3 8 6
One possible form of the confirmed data is shown in table 6.
TABLE 6 exemplary forms of validated data
Figure BDA0003251168520000101
In one piece of the confirmed data shown in table 6, the attribute values of the "C" field are 3, 4, and 7.
It should be understood that, in an actual service method, the data to be matched and the confirmed data further include other fields related to services, and the present specification does not limit the storage manner of the above data when storing non-matching fields.
In a preferred embodiment, the 1 st combination comprises a name field, a gender field and an identification number field, and the 1 st weighting parameter corresponding to the identification number field is greater than 0.5. For example, the 1 st weighting parameter of the name field is 0.1, the 1 st weighting parameter of the gender field is 0.1, and the 1 st weighting parameter corresponding to the identification number field is 0.8. With the configuration, the 1 st similarity obtained by calculation can have higher discrimination.
In some embodiments, the 2 nd combination may include a contact phone field, the 2 nd weighting parameter of the contact phone field is greater than 0.5, and other fields of the 2 nd combination may be set according to different requirements. The 3 rd combination may include a home address field, a 3 rd weighting parameter of the home address field is greater than 0.5, and other fields of the 3 rd combination may be set according to different requirements.
The matching field comprises a name field, and the method for acquiring the similar value corresponding to the name field comprises the following steps: calculated according to the following formula:
Figure BDA0003251168520000111
wherein similarity represents the similarity value, EDABRepresents the edit distance between A and B, max () represents the maximum operation, LAString length, L, for ABAnd B represents the length of a character string of B, A represents the attribute value stored in the name field in the data to be matched, and B represents the attribute value stored in the name field in the confirmed data.
The edit distance may also be referred to as a Levenshtein (name) distance, which refers to the minimum number of edit operations required to convert one character string into another character string. Permitted editing operations include replacing one character with another, inserting one character, and deleting one character. The edit distance was first proposed by the russian scientist Levenshtein. So configured, on the one hand, the calculation problem of the similarity between the name strings is solved, and on the other hand, when two names are completely different, the calculation result is 0 and is identical with the expectation.
Referring to fig. 2, the method for establishing a patient main index in a CDR system includes:
s10, acquiring original data from at least two service systems, wherein the original data comprises matching fields;
s20, classifying the original data into a first data and a second data based on a cleaning rule;
s31, generating confirmed data by the first data based on a merging rule, wherein the confirmed data comprises a matching field and a main index field;
s41, configuring the second data into data to be matched, and obtaining a matching result by the data to be matched based on the data matching method;
s42, if the matching is successful, merging the current data to be matched and the matched confirmed data;
and S43, if the matching fails, generating temporary index data by the current data to be matched.
In fig. 2, the accurate data is the first data, the fuzzy data is the second data, and the fuzzy matching is the data matching method described above. In step S10, the data of the CDR system, i.e. the data originating from at least two service systems, is merged into the precision data in step S42, which should be understood as being merged into the precision data for which the primary index has been generated, i.e. the validated data.
The flow of processing stock data and incremental data is not actually different, but only when stock data (or the patient master index creation method in the CDR system is first run), the number of the confirmed data is 0 at first time, and when incremental data is processed, a part of the confirmed data already exists.
The generation rule of the main index field can be set according to actual requirements, and the expansion description is not performed here. The cleaning rule in step S20 may be set according to actual needs, and in an embodiment, the data with the name field being null and the id field being null may be classified as the second data, and the rest may be classified as the first data. Other rules may be set in other embodiments.
Further, the matching field includes a name field and an identity card field, and the merge rule includes:
judging whether the identity card fields of the first data and one piece of confirmed data are equal, and judging that the name fields of the first data and the confirmed data are equal;
if the first data is equal to the identity card field of one piece of confirmed data and the current first data is equal to the name field of the current piece of confirmed data; merging the current first data and the current confirmed data;
otherwise, the current first data is independently converted into a new confirmed data.
I.e. equal data is merged and unequal data is independently converted into a new piece of said confirmed data. Wherein the conversion process may include: copying the whole content of the first data and adding the main index field, the conversion process may also include other steps required by the business logic, and those skilled in the art may set the conversion process according to common knowledge, and will not be described herein. Theoretically, it is possible that two pieces of data which do not meet the condition are actually directed to the same patient, but it is found in actual implementation that the error amount caused by the rule is very small, and after such an error occurs, the error can be corrected manually, so the above logic is adopted for setting and distinguishing in the present embodiment.
Where each of the matching fields in the first data stores only one attribute value and each of the matching fields in the validated data stores one or more than two attribute values, the logic herein may also be understood with reference to the foregoing in relation to tables 5 and 6. The step of determining whether the matching fields of the first data and the confirmed data are equal comprises:
if the attribute value of the matching field of the first data is a null value, judging that the attribute values are not equal;
if the attribute value of the matching field of the first data is not a null value and the attribute value of the matching field of the first data is not equal to the attribute value of the matching field of the confirmed data, judging that the attribute values are not equal;
and if the attribute value of the matching field of the first data is not a null value and the attribute value of the matching field of the first data is equal to one of the attribute values of the matching field of the confirmed data, judging that the attribute values are equal.
The weight used in the weighted average may be set according to the number of occurrences of each attribute in the history data, or may be set according to another method.
The step of merging the data to be merged and the confirmed data comprises the following steps:
the data to be merged and the confirmed data are sequentially judged about each matching field, and if the attribute value of the matching field of the data to be merged is not a null value and the attribute value of the matching field of the data to be merged is not equal to the attribute value of the matching field of the confirmed data, the current attribute value of the data to be merged is stored in the matching field of the confirmed data;
wherein the data to be merged comprises the first data and the data to be matched.
For example, the content of the data to be merged is shown in table 5, the content of the confirmed data is shown in table 6, and the merged data is shown in table 7.
TABLE 7 exemplary forms of consolidated validated data
Figure BDA0003251168520000131
The merging manner of the to-be-merged data and the confirmed data merging other non-matching fields may be set according to actual needs, and a description thereof is not given here.
Based on the method, a CDR system can be developed, the CDR system integrates medical data of each system of the hospital, and an EMPI (Enterprise Master Index) system establishes a Patient Master Index for the CDR system to perform unified management on the medical data. The accuracy of the patient master index depends on the accuracy of the patient information matching algorithm. The EMPI system provides patient master index generation and patient master index query functions. By using the patient master index in the EMPI system, doctors and related personnel can quickly find all historical treatment records of patients in the CDR system to assist in disease diagnosis and medical research.
The embodiment also provides a patient master index establishing device, which comprises a matching module, wherein the matching module is used for executing the patient data matching method in the CDR system.
Optionally, the patient master index creating apparatus further includes:
the system comprises an acquisition module, a matching module and a processing module, wherein the acquisition module is used for acquiring original data from at least two service systems, and the original data comprises matching fields;
a classification module for classifying the raw data into first data and second data based on a cleaning rule;
a merging module, configured to generate confirmed data from the first data based on a merging rule, where the confirmed data includes a matching field and a primary index field;
the input module is used for configuring the second data into data to be matched and inputting the data to be matched into the matching module; and the number of the first and second groups,
the processing module is used for processing data based on the matching result of the matching module, and if the matching is successful, the current data to be matched and the matched confirmed data are merged; and if the matching fails, generating temporary index data by the current data to be matched.
The specific workflow of the above device can be understood by referring to the description of the patient master index establishing method in the CDR system in the present specification.
The patient main index establishing device can solve the problem that uniform patient identification does not exist among all business systems in the prior art.
Compared with the prior art, in the patient data matching method, the main index establishing method and the device in the CDR system provided by the invention, the patient data matching method in the CDR system comprises the following steps: acquiring data to be matched and confirmed data; sequentially acquiring at least two similarities of each confirmed data based on at least two combinations of the matching fields; and judging whether the matching is successful or not based on all the similarity, and obtaining the confirmed data matched with the data to be matched. Based on the patient data matching method in the CDR system, the main index of the patient can be constructed, all historical diagnosis records of the patient can be further obtained, the diagnosis of the patient condition and medical research can be assisted, and the problem that uniform patient identification does not exist among all business systems of a hospital in the prior art is solved. On the other hand, the effectiveness of the matching result is increased by using a multi-turn matching mode, and the complex data working conditions can be met.
The above description is only for the purpose of describing the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, and any variations and modifications made by those skilled in the art according to the above disclosure are within the scope of the present invention.

Claims (13)

1. A method of patient data matching in a CDR system, comprising:
acquiring data to be matched and confirmed data, wherein the data to be matched comprises a matching field, and the confirmed data comprises the matching field;
sequentially acquiring the ith similarity of the data to be matched and each confirmed data based on the ith combination of the matching fields;
judging whether the matching is successful or not based on all the ith similarity, and if so, obtaining a piece of confirmed data matched with the data to be matched based on all the ith similarity;
wherein, the value range of i is all integers from 1 to n, and n is an integer larger than 1.
2. The method for matching patient data in a CDR system of claim 1, wherein the step of determining whether the matching is successful based on all of the ith similarities comprises: if the ith similarity corresponding to each piece of confirmed data is smaller than or equal to the ith threshold, the matching is failed; otherwise, the matching is successful;
or,
the step of judging whether the matching is successful or not based on all the ith similarity comprises the following steps: if the sum of all the ith similarity corresponding to each piece of confirmed data is smaller than a preset threshold value, the matching fails; otherwise, the matching is successful.
3. The method for matching patient data in a CDR system according to claim 2, wherein said step of obtaining a piece of said confirmed data matching said data to be matched based on all of said ith similarities comprises: selecting the confirmed data with the largest sum of all the ith similarities.
4. The method for matching patient data in a CDR system according to claim 2, wherein said step of obtaining a piece of said confirmed data matching said data to be matched based on all of said ith similarities comprises:
if the ith similarity of at least one confirmed data in the ith set is greater than the ith threshold and i is less than n, forming an ith +1 set by the confirmed data with the ith similarity greater than the ith threshold in the ith set and judging again;
otherwise, selecting the confirmed data with the largest sum of all the ith similarity in the ith set, or selecting the confirmed data with the largest ith similarity in the ith set;
wherein the 1 st set is all the confirmed data.
5. The method for matching patient data in a CDR system according to any of claims 1 to 4, wherein the step of obtaining the ith similarity between the data to be matched and the confirmed data comprises:
and sequentially obtaining a similarity value corresponding to each matching field in the ith combination, wherein the similarity value is weighted and averaged based on the ith weighting parameter to obtain the ith similarity.
6. The method of matching patient data in a CDR system of claim 5, wherein each of said matching fields in said data to be matched stores only one attribute value, and each of said matching fields in said validated data stores one or more than two attribute values; the step of obtaining the similar value corresponding to the matching field comprises:
and performing similarity calculation on the attribute values in the data to be matched and each attribute value in the corresponding matching field in the confirmed data, and obtaining the similarity value after weighted average of calculation results.
7. The method of claim 5, wherein the 1 st combination comprises a name field, a gender field, and an identification number field, and the 1 st weighting parameter corresponding to the identification number field is greater than 0.5.
8. The method of matching patient data in a CDR system of claim 5, wherein said matching field includes a name field, the method of obtaining said similarity value corresponding to said name field comprising: calculated according to the following formula:
Figure FDA0003251168510000021
wherein similarity represents the similarity value, EDABRepresents the edit distance between A and B, max () represents the maximum operation, LAString length, L, for ABAnd B represents the length of a character string of B, A represents the attribute value stored in the name field in the data to be matched, and B represents the attribute value stored in the name field in the confirmed data.
9. A method for establishing a patient main index in a CDR system is characterized by comprising the following steps:
acquiring original data from at least two service systems, wherein the original data comprises matching fields;
the raw data is classified into first data and second data based on a cleaning rule;
the first data generates confirmed data based on a merging rule, and the confirmed data comprises a matching field and a main index field;
the second data is configured as data to be matched, and the data to be matched obtains a matching result based on a patient data matching method in a CDR system according to any one of claims 1-8;
if the matching is successful, combining the current data to be matched and the matched confirmed data;
and if the matching fails, generating temporary index data by the current data to be matched.
10. The patient master index building method in a CDR system of claim 9, wherein the matching fields include a name field and an identity card field, and wherein the merge rule comprises:
judging whether the identity card fields of the first data and one piece of confirmed data are equal, and judging that the name fields of the first data and the confirmed data are equal;
if the first data is equal to the identity card field of one piece of confirmed data and the current first data is equal to the name field of the current piece of confirmed data; merging the current first data and the current confirmed data;
otherwise, the current first data is independently converted into a new confirmed data.
11. The method of claim 10, wherein each of the matching fields of the first data stores only one attribute value, each of the matching fields of the confirmed data stores one or more attribute values, and the step of determining whether the matching fields of the first data and the confirmed data are equal comprises:
if the attribute value of the matching field of the first data is a null value, judging that the attribute values are not equal;
if the attribute value of the matching field of the first data is not a null value and the attribute value of the matching field of the first data is not equal to the attribute value of the matching field of the confirmed data, judging that the attribute values are not equal;
and if the attribute value of the matching field of the first data is not a null value and the attribute value of the matching field of the first data is equal to one of the attribute values of the matching field of the confirmed data, judging that the attribute values are equal.
12. The method of claim 11, wherein each matching field of the data to be matched stores only one attribute value, and the step of merging the data to be merged and the confirmed data comprises:
the data to be merged and the confirmed data are sequentially judged about each matching field, and if the attribute value of the matching field of the data to be merged is not a null value and the attribute value of the matching field of the data to be merged is not equal to the attribute value of the matching field of the confirmed data, the current attribute value of the data to be merged is stored in the matching field of the confirmed data;
wherein the data to be merged comprises the first data and the data to be matched.
13. A patient master index building apparatus comprising a matching module for performing the patient data matching method in the CDR system according to any one of claims 1 to 8.
CN202111045885.3A 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device Active CN113742348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111045885.3A CN113742348B (en) 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111045885.3A CN113742348B (en) 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device

Publications (2)

Publication Number Publication Date
CN113742348A true CN113742348A (en) 2021-12-03
CN113742348B CN113742348B (en) 2024-05-17

Family

ID=78736661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111045885.3A Active CN113742348B (en) 2021-09-07 2021-09-07 Patient data matching method in CDR system, main index establishing method and device

Country Status (1)

Country Link
CN (1) CN113742348B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969124A (en) * 2022-06-15 2022-08-30 上海柯林布瑞信息技术有限公司 Search index construction method and device based on data mapping table
CN115269613A (en) * 2022-09-27 2022-11-01 四川互慧软件有限公司 Patient main index construction method, system, equipment and storage medium
CN116072303A (en) * 2023-04-03 2023-05-05 南京吾爱网络技术有限公司 Medical information card data identification system and method for hospital information department

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060026156A1 (en) * 2004-07-28 2006-02-02 Heather Zuleba Method for linking de-identified patients using encrypted and unencrypted demographic and healthcare information from multiple data sources
CN101727535A (en) * 2008-10-30 2010-06-09 北大方正集团有限公司 Cross indexing method for patients crossing system and system thereof
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
WO2015169597A1 (en) * 2014-05-07 2015-11-12 Cytolon Ag Methods and systems for predicting alloreactivity in transplantation
CN106650259A (en) * 2016-12-22 2017-05-10 深圳中兴网信科技有限公司 Patient information management method and management system
CN109739862A (en) * 2019-01-07 2019-05-10 深圳中兴网信科技有限公司 Main index of patients weight method for building up, Main index of patients weight establish system
CN110197724A (en) * 2019-03-12 2019-09-03 平安科技(深圳)有限公司 Predict the method, apparatus and computer equipment in diabetes illness stage
KR102055309B1 (en) * 2018-10-30 2019-12-13 재단법인 아산사회복지재단 Method and system for identifying patient
CN111739634A (en) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 Method, device and equipment for intelligently grouping similar patients and storage medium
CN111768821A (en) * 2020-05-29 2020-10-13 上海森亿医疗科技有限公司 Distributed patient record matching method, system and terminal
CN111785341A (en) * 2020-06-30 2020-10-16 平安国际智慧城市科技股份有限公司 Patient main index data merging method and device based on similarity
CN112286912A (en) * 2020-08-12 2021-01-29 上海柯林布瑞信息技术有限公司 Medical data quality checking method and device, terminal and storage medium
CN112863626A (en) * 2021-03-08 2021-05-28 北京冠新医卫软件科技有限公司 Multi-platform similar medical data removing method, device and equipment
US20210166795A1 (en) * 2018-11-08 2021-06-03 Express Scripts Strategic Development, Inc. Systems and methods for patient record matching
CN112967799A (en) * 2021-03-30 2021-06-15 广州启生信息技术有限公司 Doctor data processing method and platform
CN113130038A (en) * 2021-04-30 2021-07-16 康键信息技术(深圳)有限公司 Medicine data matching method, device, equipment and storage medium

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060026156A1 (en) * 2004-07-28 2006-02-02 Heather Zuleba Method for linking de-identified patients using encrypted and unencrypted demographic and healthcare information from multiple data sources
CN101727535A (en) * 2008-10-30 2010-06-09 北大方正集团有限公司 Cross indexing method for patients crossing system and system thereof
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
WO2015169597A1 (en) * 2014-05-07 2015-11-12 Cytolon Ag Methods and systems for predicting alloreactivity in transplantation
CN106650259A (en) * 2016-12-22 2017-05-10 深圳中兴网信科技有限公司 Patient information management method and management system
KR102055309B1 (en) * 2018-10-30 2019-12-13 재단법인 아산사회복지재단 Method and system for identifying patient
US20210166795A1 (en) * 2018-11-08 2021-06-03 Express Scripts Strategic Development, Inc. Systems and methods for patient record matching
CN109739862A (en) * 2019-01-07 2019-05-10 深圳中兴网信科技有限公司 Main index of patients weight method for building up, Main index of patients weight establish system
CN110197724A (en) * 2019-03-12 2019-09-03 平安科技(深圳)有限公司 Predict the method, apparatus and computer equipment in diabetes illness stage
CN111739634A (en) * 2020-05-14 2020-10-02 平安科技(深圳)有限公司 Method, device and equipment for intelligently grouping similar patients and storage medium
CN111768821A (en) * 2020-05-29 2020-10-13 上海森亿医疗科技有限公司 Distributed patient record matching method, system and terminal
CN111785341A (en) * 2020-06-30 2020-10-16 平安国际智慧城市科技股份有限公司 Patient main index data merging method and device based on similarity
CN112286912A (en) * 2020-08-12 2021-01-29 上海柯林布瑞信息技术有限公司 Medical data quality checking method and device, terminal and storage medium
CN112863626A (en) * 2021-03-08 2021-05-28 北京冠新医卫软件科技有限公司 Multi-platform similar medical data removing method, device and equipment
CN112967799A (en) * 2021-03-30 2021-06-15 广州启生信息技术有限公司 Doctor data processing method and platform
CN113130038A (en) * 2021-04-30 2021-07-16 康键信息技术(深圳)有限公司 Medicine data matching method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
S. PAEMBONAN等: "Combination of K-Means and Profile Matching for Drag Substitution", 2018 2ND EAST INDONESIA CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (EICONCIT), vol. 2018, 24 October 2019 (2019-10-24), pages 180 - 183 *
吴艳艳等: "基于IHE PIX/PDQ框架构建临床数据中心MPI***的应用研究", 中国数字医学, vol. 10, no. 2, 15 February 2015 (2015-02-15), pages 25 - 28 *
陈震涛: "多源医疗数据分析模型研究及实现", 中国优秀硕士学位论文全文数据库 (医药卫生科技辑), vol. 2021, no. 4, 15 April 2021 (2021-04-15), pages 054 - 24 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969124A (en) * 2022-06-15 2022-08-30 上海柯林布瑞信息技术有限公司 Search index construction method and device based on data mapping table
CN115269613A (en) * 2022-09-27 2022-11-01 四川互慧软件有限公司 Patient main index construction method, system, equipment and storage medium
CN115269613B (en) * 2022-09-27 2023-01-13 四川互慧软件有限公司 Patient main index construction method, system, equipment and storage medium
CN116072303A (en) * 2023-04-03 2023-05-05 南京吾爱网络技术有限公司 Medical information card data identification system and method for hospital information department
CN116072303B (en) * 2023-04-03 2023-06-02 南京吾爱网络技术有限公司 Medical information card data identification system and method for hospital information department

Also Published As

Publication number Publication date
CN113742348B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
CN113742348B (en) Patient data matching method in CDR system, main index establishing method and device
CN111210916B (en) Medical record home page coding method and system
JP5098559B2 (en) Similar image search device and similar image search program
CN105069124B (en) A kind of International Classification of Diseases coding method of automation and system
CN102165453B (en) For merging the clinical system and method being calculated machine auxiliary diagnosis with characteristics of image
CN112613522B (en) Method for correcting recognition result of medicine taking order based on fusion font information
CN115269613B (en) Patient main index construction method, system, equipment and storage medium
CN101911078A (en) Based on disease probability vector retrieval of similar patient case
US20210012426A1 (en) Methods and systems for anamoly detection in dental insurance claim submissions
CN109102150B (en) Process model correction method based on ladder matrix and process tree
CN111341458B (en) Single-gene disease name recommendation method and system based on multi-level structure similarity
EP3452916A1 (en) Large scale social graph segmentation
Chen et al. Flow-Mixup: Classifying multi-labeled medical images with corrupted labels
CN113034224A (en) Similarity-based house source recommendation method, system, equipment and storage medium
CN117457081A (en) Space transcriptome data processing method and system based on hypergraph
CN113722306B (en) Medical data restoration method and device based on entity similarity and computer equipment
CN115588492A (en) Diagnosis guiding method and system based on artificial intelligence
CN112506917B (en) Dictionary mapping method, device, system, equipment and medium for main data
JP6974878B2 (en) Methods and systems for converting clinical practice guidelines into computer-interpretable models
CN113239096A (en) Method for increasing DICOM image cloud archiving and warehousing speed
CN113192584A (en) Remote medical record consulting system
CN112967799A (en) Doctor data processing method and platform
Logothetis et al. PiMS: A Pre-ML Labelling Tool
Jia et al. Clustering Algorithm with Learnable Distance for Categorical Data with Nominal and Ordinal Attributes
CN114334145B (en) Dynamic identification method for atypical critical patients

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant