CN112712899A - Data analysis method based on primary liver cancer big data and storage medium - Google Patents

Data analysis method based on primary liver cancer big data and storage medium Download PDF

Info

Publication number
CN112712899A
CN112712899A CN202110047657.3A CN202110047657A CN112712899A CN 112712899 A CN112712899 A CN 112712899A CN 202110047657 A CN202110047657 A CN 202110047657A CN 112712899 A CN112712899 A CN 112712899A
Authority
CN
China
Prior art keywords
data
report
patient
image
nodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110047657.3A
Other languages
Chinese (zh)
Inventor
刘景丰
刘红枝
郭鹏飞
陈振伟
李保晟
李海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases)
Fuzhou Yixing Dashuju Industry Investment Co ltd
Original Assignee
Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases)
Fuzhou Yixing Dashuju Industry Investment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases), Fuzhou Yixing Dashuju Industry Investment Co ltd filed Critical Mengchao Hepatobiliary Hospital Of Fujian Medical University (fuzhou Hospital For Infectious Diseases)
Priority to CN202110047657.3A priority Critical patent/CN112712899A/en
Publication of CN112712899A publication Critical patent/CN112712899A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Landscapes

  • Public Health (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The application relates to the technical field of big data processing, in particular to a data analysis method and a storage medium based on primary liver cancer big data. The method comprises the following steps: acquiring analysis data from an electronic medical record system, an image acquisition and output system and an inspection information system; preprocessing population information to obtain patient index number information; preprocessing the image report and performing structured analysis processing to obtain the attribute of the nodule; preprocessing the inspection report to obtain alpha fetoprotein inspection data; summarizing patient index number information, an image report and alpha fetoprotein test data to obtain summarized data; and analyzing the summarized data according to a preset rule to obtain a patient condition analysis result. The technical scheme of the invention extracts data in different systems for normalization and unification, screens out key index information, analyzes the hepatocellular carcinoma condition of the patient according to a preset rule and gives diagnosis and treatment suggestions, thereby improving the working efficiency and the diagnosis accuracy.

Description

Data analysis method based on primary liver cancer big data and storage medium
Technical Field
The application relates to the technical field of big data processing, in particular to a data analysis method and a storage medium based on primary liver cancer big data.
Background
Hepatocellular carcinoma (liver cancer) is the 4 th most common malignant tumor in China, and the 2 nd malignant tumor is the death rate of diseases. Surgical resection is the first choice for treating liver cancer, but liver cancer patients usually have no abnormal clinical manifestations in the early stage, and the diagnosis is often developed to the middle and late stage, and only 15% -30% of patients can be treated by radical surgery. Screening of high risk group of liver cancer is helpful for early detection, early diagnosis and early treatment of liver cancer, and is the key for improving curative effect of liver cancer. The number of people at high risk for liver cancer in China, including Hepatitis B Virus (HBV) and/or Hepatitis C Virus (HCV) infection, excessive drinking, non-alcoholic steatohepatitis and cirrhosis caused by various other reasons, exceeds 1 hundred million. In order to improve the early diagnosis rate of liver cancer, the primary liver cancer diagnosis and treatment standard (2019), which is a standard, is applied to the screening process of high risk groups of liver cancer.
However, when patients with high risk of liver cancer see other departments, doctors in non-liver specialties have different degrees of knowledge on the liver cancer screening process and diagnosis standard. Similarly, the primary doctor may have errors in the screening and diagnosis process of the liver cancer patient due to the lack of knowledge updating channel, insufficient diagnosis experience of the specialized diseases and the like, which affects the early diagnosis of the patient. In addition, in the face of an increasing number of high risk groups of liver cancer, the working pressure of specialists in liver disease is high.
With the rapid development of big data related technologies, artificial intelligence innovation technologies and liver cancer standard diagnosis paths are fused with each other, liver cancer pathological data are automatically analyzed based on a primary liver cancer big data Platform (PLCBD), and an analysis model with high liver cancer screening efficiency and accurate analysis results is researched to serve as an auxiliary diagnosis and treatment tool, so that the problem that improvement of liver cancer curative effect needs to be solved urgently is solved.
Disclosure of Invention
The invention aims to provide a data analysis method and a storage medium based on primary liver cancer big data, and the method and the storage medium rely on a primary liver cancer big data platform to capture and standardize data of an electronic medical record system, an imaging system and an inspection system in real time, obtain indexes required by liver cancer diagnosis and realize automatic screening and analysis of liver cancer.
In order to solve the technical problems, the invention provides a data analysis method based on primary liver cancer big data, which comprises the following steps:
step 1, acquiring analysis data from an electronic medical record system, an image acquisition and output system and a laboratory examination information system; the analysis data comprises population information, clinical clinic visit information, inspection reports and image reports; the population information comprises outpatient service population information and hospitalization population information;
step 2, preprocessing the outpatient population information and the hospitalization population information to obtain unique index number information of the patient;
step 3, preprocessing the image report; the image report comprises an ultrasonic report, a CT report, a magnetic resonance report and a Pumei magnetic resonance report;
step 4, performing structured analysis processing on the image report to obtain a nodule attribute; the nodule attributes include a nodule property, a nodule feature, and a nodule size;
step 5, preprocessing the inspection report to obtain alpha fetoprotein inspection data;
step 6, summarizing the patient index number information, the image report and the alpha fetoprotein test data to obtain summarized data;
and 7, analyzing the summarized data according to a preset rule to obtain an illness state analysis result of each patient.
Further, the preprocessing the outpatient service population information and the hospitalization population information to obtain the patient index number information comprises the following steps:
step 21, carrying out data verification on the outpatient service population information and the hospitalization population information;
step 22, combining the outpatient service population information and the hospitalization population information and filtering repeated data;
step 23, creating a unique index number for each patient by adopting a Hash algorithm;
and 24, associating the unique index number of the patient with the clinical visit information to form patient index number information.
Further, the preprocessing the image report includes the following steps:
step 31, cleaning up error data in the image report;
step 32, screening and filtering the data in the image report;
and step 33, splitting the image report into an ultrasonic report, a CT report, a magnetic resonance report and a Pumei display magnetic resonance report according to the scanning mode of the image report.
Further, the structural analysis processing of the image report to obtain the nodule attribute comprises the following steps:
step 41, analyzing the properties and characteristics of the nodules from the inspection results of the image reports;
step 42, analyzing the size of the nodule from the inspection result of the image report;
step 43, the nodule properties, the nodule characteristics and the nodule size are associated and saved into the image report.
Further, the preprocessing the test report to obtain alpha-fetoprotein test data comprises the following steps:
step 51, cleaning up error data in the inspection report;
step 52, screening the data in the inspection report to filter out the alpha fetoprotein inspection data;
further, the analyzing the summarized data according to a preset rule to obtain an analysis result of the disease condition of each patient includes the following steps:
step 701, judging whether the patient has image data, if so, executing step 702, otherwise, skipping to step 703 to continue executing;
step 702, judging whether a solid nodule exists according to the nodule property of the image data, if so, skipping to step 709 to continue execution, otherwise, skipping to step 704 to continue execution;
step 703, setting the analysis result as a patient recommended to carry out ultrasonic examination, and jumping to step 722 to continue execution;
step 704, judging whether the alpha fetoprotein detection result is positive, if so, executing step 705, otherwise, skipping to step 708 to continue executing;
step 705, judging whether the MRI image and the CT image are finished, if so, executing step 706, otherwise, skipping to step 707 to continue executing;
step 706, setting the analysis result to suggest that the patient carries out alpha fetoprotein detection and image follow-up for 2-3 months, and skipping to step 722 to continue execution;
step 707, setting the analysis result to suggest that the patient performs the review after completing the MRI image and the CT image, and skipping to step 722 to continue the execution;
step 708, setting the analysis result to suggest the patient to follow-up for 6 months, and jumping to point to step 722 to continue execution;
step 709, counting the maximum diameters of all the image nodules, if the maximum diameter of the nodule is larger than 2 cm, executing step 710, otherwise, executing step 713;
step 710, calculating the total number of the image examinations of the patient with the typical characteristics of liver cancer, if the total number is greater than or equal to 1 item, executing step 711, otherwise executing step 712;
step 711, setting the analysis result as primary liver cancer, and skipping to step 722 to continue execution;
step 712, counting the total number of the benign features of the patient nodule, and skipping to step 714 to continue execution;
step 713, calculating the total number of the image examinations of the patient with the typical characteristics of the liver cancer, if the total number is greater than or equal to 2 items, skipping to step 711 to continue execution, otherwise, skipping to step 712 to continue execution;
step 714, judging whether the benign tumor feature quantity is greater than or equal to 1 item, if so, executing step 715, otherwise, executing step 716;
step 715, setting the analysis result as benign tumor, recommending the patient to carry out 1 time of ultrasonic and serum AFP detection every 6 months, and skipping to step 722 to continue execution;
step 716, determining whether the enhanced MRI check is completed, if not, executing step 717, otherwise, executing step 718;
step 717, setting the analysis result as a patient recommendation for carrying out the enhanced MRI examination, and skipping to step 722 for continuing execution;
step 718, determining whether the ultrasound contrast examination is completed, if so, executing step 719, otherwise, executing step 721;
step 719, judging whether the EOB-MRI check is completed, if so, executing step 706, otherwise, executing step 720;
step 720, setting an analysis result to suggest that the patient carries out a general beauty examination, and jumping to step 722 to continue execution;
step 721, setting the analysis result as suggesting the patient to carry out the ultrasound contrast examination, and skipping to step 722 to continue execution;
and step 722, returning an analysis result, and ending the analysis flow.
Further, the data analysis method based on the primary liver cancer big data comprises the following steps:
and step 8, displaying the disease condition analysis result on a terminal interface.
Accordingly, the present application also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the data analysis method based on primary liver cancer big data according to any one of claims 1 to 7.
Different from the prior art, the technical scheme of the invention has the following beneficial effects:
1. the data of the patients dispersed in different systems is extracted, the data is normalized and unified, key index information required by subsequent analysis is screened out, a primary liver cancer big data platform is established, and perfect and accurate original data are provided for hepatocellular carcinoma automatic analysis.
2. According to the preset analysis and judgment rules, the hepatocellular carcinoma disease condition of each patient can be efficiently and automatically analyzed and diagnosis and treatment guide suggestions are given based on the established primary liver cancer data platform, a doctor is assisted in treatment and diagnosis of liver cancer, the working efficiency can be improved, and the diagnosis accuracy can be improved.
Drawings
FIG. 1 is a flow chart of the steps of the data analysis method based on the primary liver cancer big data of the present invention.
FIG. 2 is a flowchart illustrating the steps of pre-processing the outpatient population information and the hospitalization population information to obtain the patient index number information according to the present invention.
FIG. 3 is a flow chart of the steps of preprocessing the image report according to the present invention.
FIG. 4 is a flowchart illustrating the steps of performing a structural analysis process on the image report to obtain nodule attributes according to the present invention.
FIG. 5 is a flow chart of the steps of the present invention for preprocessing the assay report to obtain alpha-fetoprotein assay data.
FIG. 6 is a flowchart illustrating the steps of analyzing the summarized data according to a predetermined rule to obtain the disease analysis result of each patient according to the present invention.
FIG. 7 is a diagram showing the analysis result of the disease condition of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, it is a flow chart of the steps of the data analysis method based on the primary liver cancer big data of the present invention, comprising the following steps:
step 1, acquiring analysis data from an electronic medical record system, an image acquisition and output system and a laboratory examination information system; the analysis data includes population information, clinical visit information, inspection reports, and image reports. The population information includes outpatient population information and hospitalization population information.
In a medical information system, patient data are usually distributed in different subsystems, for example, patient outpatient information is in an electronic medical record system EMR, data of various image examinations performed by a patient are stored in an image acquisition and output system PACS, data of various tests performed by a patient are stored in a laboratory examination information system LIS, and the data are used for analysis, and the data distributed in different systems are firstly extracted to a unified platform, and the general steps include: 1. the method comprises the steps of firstly obtaining database accounts or hot backup database accounts of an electronic medical record system EMR, an image acquisition and output system PACS and a laboratory examination information system LIS, and giving access rights of the accounts to population information, clinical doctor information, inspection reports and image reports. 2. A database system (namely a target database) with the same type AS a source database is constructed, and a local backup TABLE is created by the required data through CREATE TABLE AS SELECT (hereinafter abbreviated AS CTAS) of structured query language SQL. Preferably, to avoid impact on the business system, the frequency of queries to create the temporary table and the time access limitations will be limited. The data query backup frequency is 30 minutes of update by default, and the data query time range except the population information is limited to be 180 days. 3. When the data types of different databases are different and the data types which cannot be read during query are used, format conversion is carried out by using a data type conversion function CONVERT, so that the content of the data is not lost when the databases are synchronous. 4. Different types of databases including SQL Server, MySQL and the like are realized by utilizing database synchronization technology, such as Oracle GateWay, Oracle golden GateWay, ODBC and the like, so as to realize uniform query on a target database system. 5. And (3) creating a data backup table of population information, clinical clinic visit information, inspection reports and image reports on the target database by adopting a CTAS (simultaneous localization and mapping) method.
Step 2, preprocessing the outpatient population information and the hospitalization population information to obtain unique index number information of the patient; as shown in fig. 2, the present invention is a flowchart of the steps of preprocessing the outpatient population information and the hospitalization population information to obtain the patient index number information, and includes the following steps:
step 21, carrying out data verification on the outpatient service population information and the hospitalization population information;
in order to ensure the accuracy of the patient information and avoid various abnormal problems generated in the subsequent analysis process, the data of the outpatient service population information and the inpatient service population information needs to be verified, and abnormal data in the outpatient service population information is filtered. In particular, critical information processing for demographic information typically includes: name processing, namely clearing numbers, spaces and special characters; birthday processing, namely converting the date in the character string format into a unified date format by using a to _ date function, marking the date as abnormal if the type cannot be converted, and marking the record of which the birthday is less than 1900 years or more than the current date as abnormal; sex processing, namely, the sex of the numerical type is transferred to a male or a female, and other types which cannot be transferred are marked as abnormal; processing the identity card, wherein the identity card should meet 15-bit and 18-bit identity card numbers, address coding, date of birth and check bit check, and if the identity card information is empty or does not meet a check mechanism, marking the record as abnormal; the treatment number is processed, and the data which does not conform to the treatment number format specification is marked as abnormal; checking whether the sex, the birthday and the identity card of the patient are consistent, if so, giving priority to the identity card information and recording the abnormity; exception handling, namely emptying the field marked as exception and reserving other fields of the record; and (4) processing repeated personal information, and sequencing and combining repeated population information by using a field similarity matching method (Smith-Waterman algorithm, editing distance and Cosine similarity function). After the series of processing, the obtained outpatient service population information table and inpatient service population information table include the number of treatment (outpatient service number, inpatient service number), name, birthday, sex, birthday, identity card, hospital of treatment, type of treatment, and the like.
Step 22, combining the outpatient service population information and the hospitalization population information and filtering repeated data; typically, UNION ALL, which uses database SQL, merges the outpatient and hospitalized demographic tables. Meanwhile, processing repeated data existing after combination, sequencing records in a database, detecting whether the records are repeated or not by comparing whether adjacent records are similar or not, and combining personal information of patients with the same name, identity card, gender and birthday by using an algorithm (a priority queue algorithm, a neighbor sequencing algorithm, multi-turn neighbor sequencing and the like) for eliminating repeated records or a fuzzy matching strategy.
Step 23, creating a unique index number for each patient by adopting a Hash algorithm;
the main indices used vary from one medical subsystem to another, such as the inability to use a hospital number to query a patient for a laboratory examination at an outpatient clinic. Therefore, in the embodiment of the invention, the main index is established as the unique identifier of the patient, the databases of the patients in different information systems are effectively associated together, and the consistency of the personal information of the patients in the various medical information systems and the information sharing during outpatient service or hospitalization are ensured. In particular, the patient primary index (EMPI) is created by using the population information through a Hash algorithm.
And 24, associating the unique index number of the patient with the clinical visit information to form patient index number information, wherein the content of the patient index number information comprises the EMPI, the visit number, the name, the ID card, the gender, the birthday and the admission date.
Step 3, preprocessing the image report; the image report comprises an ultrasonic report, a CT report, a magnetic resonance report and a Pumei magnetic resonance report; referring to fig. 3, a flowchart of the preprocessing step for the image report according to the present invention includes the following steps:
step 31, cleaning up error data in the image report; for example, error data processing includes removing records of missing key information including video index, date of examination, video view, video diagnosis, scan mode, scan location. For another example, according to the video report format rule, the records with inconsistent text descriptions in the fields are deleted.
Step 32, screening and filtering the data in the image report; and the data in the image report is screened and filtered as required to reduce the data volume and improve the subsequent analysis efficiency and accuracy. For example, according to the requirement of the validity period of the images for liver cancer diagnosis, the image reports with the inspection date within 30 days are screened; and screening the scanned part, wherein the scanned part is an image report of the abdomen, and the image reports of other parts are removed. For another example, screening is performed for a scanning mode, and it is first determined whether an ultrasound image is ultrasound contrast CEUS, a CT image is enhanced CT scan, and a magnetic resonance image is enhanced magnetic resonance scan, and then image records are sorted, and each type of image report is screened according to a priority order of the scanning mode, where the order is: ultrasound contrast CEUS is prior to common ultrasound, enhanced CT is prior to common CT, and enhanced MRI is prior to common MRI; if there are multiple records in the same type of video report, the latest video report is selected according to the scanning date.
And step 33, splitting the image report into an ultrasonic report, a CT report, a magnetic resonance report and a Pumei display magnetic resonance report according to the scanning mode of the image report.
Step 4, performing structured analysis processing on the image report to obtain a nodule attribute; the nodule attributes include a nodule property, a nodule feature, and a nodule size; fig. 4 is a flowchart illustrating a procedure of performing a structural analysis process on the image report to obtain a nodule attribute according to the present invention, which includes the following steps:
step 41, analyzing the properties and characteristics of the nodules from the inspection results of the image reports; since the examination conclusion of the image report includes the text description of the image analysis result of the doctor, the method of the present application needs to extract the key information from the text of the image report conclusion of each patient for the subsequent analysis. The specific extraction method comprises the following steps: firstly, carrying out grammar clause division on the content in the inspection conclusion; judging the part described in each clause by using a part word stock in ICD10 (international disease classification code) to screen out sentences related to the liver; using keywords to exclude clauses containing postoperative treatment, wherein the keywords comprise excision, postoperative treatment, intervention, TACE, radio frequency, ablation, comprehensive treatment, radiotherapy and postoperative change; matching clauses containing nodule properties in the clauses by using keywords, wherein the keywords comprise MT, hepatocellular carcinoma, liver cancer, HCC, malignancy, ICC, recurrence, abnormal enhancement focus, abnormal enhancement shadow, recurrence, cyst, nodule and focus; and classifying the properties of the nodes into cystic properties and actual properties according to the node property keywords in the clauses. If no matching keyword exists, the nodule property is none; meanwhile, the characteristics of the nodules are classified into typical liver cancer, atypical and benign tumors.
Step 42, analyzing the size of the nodule from the inspection result of the image report; the concrete mode is as follows: separating sentences of the content in the checked Chinese; judging the part described in each clause by using a part word bank in the ICD10, and screening out sentences related to the liver; matching clauses containing nodule descriptions in the clauses by using keywords, wherein the keywords comprise density shadows, nodule shadows, echoes, enhanced ranges, enhanced shadows, abnormal signals, signal shadows, fast-forward and fast-out and the like; by using a regular expression, long and wide numbers are extracted from the clauses described by the nodes, and if a plurality of numbers exist, the maximum value is kept.
Step 43, the nodule properties, the nodule characteristics and the nodule size are associated and saved into the image report. The concrete mode is as follows: screening sentences of which the inspection conclusion comprises solid nodules, and extracting nodule description keywords of the sentences; matching the nodule description keywords with the nodule description keywords in the inspection findings to enable the clauses of the inspection conclusion to be in one-to-one correspondence with the clauses of the inspection findings; obtaining the size of the nodule after the association is completed; if a plurality of nodules exist, taking the maximum value from all the nodule sizes; the nodule properties, nodule characteristics and nodule size of each analyzed patient image report are added to an ultrasound report, a CT report, a magnetic resonance report, or a promiscuous magnetic resonance report, respectively, for direct use in subsequent analysis processes.
Step 5, preprocessing the inspection report to obtain alpha fetoprotein inspection data; fig. 5 is a flowchart of the steps of preprocessing the test report to obtain alpha-fetoprotein test data according to the present invention, which includes the following steps:
step 51, cleaning up error data in the inspection report; specifically, the inspection report logical error data processing includes: and clearing the record with missing key information, wherein the key information comprises the inspection date, the inspection sample, the inspection name, the inspection report date, the inspection result, the inspection reference range and the inspection unit. As another example, the record of the logical error in each field is deleted according to the check report format rule.
Step 52, screening the data in the inspection report to filter out the alpha fetoprotein inspection data; the data screening and filtering usually includes the following modes: screening test data, namely retrieving a test name, namely alpha fetoprotein, according to the test name of a test report, wherein a test sample is a record of serum, and removing the rest test data; processing time range, and screening an alpha fetoprotein test report with the inspection date within 30 days according to the validity period requirement of liver cancer diagnosis; and (4) screening the inspection dates, and if the same patient has a plurality of inspection records, sorting the inspection records in a reverse order according to the inspection dates and selecting the latest inspection report. After the processing in the mode, alpha-fetoprotein inspection data are obtained, and key contents of the data comprise an EMPI, a clinic number, an alpha-fetoprotein _ inspection result, an alpha-fetoprotein _ inspection date, an alpha-fetoprotein _ inspection unit and an alpha-fetoprotein _ reference range.
Step 6, summarizing the patient index number information, the image report and the alpha fetoprotein test data to obtain summarized data; specifically, the patient index number information is associated with the image report and the alpha fetoprotein test data through the visit number, and complete summarized data are generated. Therefore, basic information such as the unique index number EMPI of the patient can be inquired through the summarized data, scanning modes, occupation properties, nodule characteristics and nodule size information of various image reports of the patient can be inquired, and key information such as an alpha fetoprotein _ inspection result, an alpha fetoprotein _ inspection date, an alpha fetoprotein _ inspection unit and an alpha fetoprotein _ reference range can be inquired.
The method extracts the data of the patient dispersed in different systems, and arranges and unifies the data, screens out the key index information required by the subsequent analysis, establishes a primary liver cancer big data platform, and provides perfect and accurate original data for the automatic analysis of the hepatocellular carcinoma.
And 7, analyzing the summarized data according to a preset rule to obtain an illness state analysis result of each patient. Fig. 6 is a flowchart illustrating the steps of analyzing the summarized data according to the preset rule to obtain the disease analysis result of each patient according to the present invention, which includes the following steps:
step 701, judging whether the patient has image data, if so, executing step 702, otherwise, skipping to step 703 to continue executing;
step 702, judging whether a solid nodule exists according to the nodule property of the image data, if so, skipping to step 709 to continue execution, otherwise, skipping to step 704 to continue execution;
step 703, setting the analysis result as a patient recommended to carry out ultrasonic examination, and jumping to step 722 to continue execution;
step 704, judging whether the alpha fetoprotein detection result is positive, if so, executing step 705, otherwise, skipping to step 708 to continue executing;
step 705, judging whether the MRI image and the CT image are finished, if so, executing step 706, otherwise, skipping to step 707 to continue executing;
step 706, setting the analysis result to suggest that the patient carries out alpha fetoprotein detection and image follow-up for 2-3 months, and skipping to step 722 to continue execution;
step 707, setting the analysis result to suggest that the patient performs the review after completing the MRI image and the CT image, and skipping to step 722 to continue the execution;
step 708, setting the analysis result to suggest the patient to follow-up for 6 months, and jumping to point to step 722 to continue execution;
step 709, counting the maximum diameters of all the image nodules, if the maximum diameter of the nodule is larger than 2 cm, executing step 710, otherwise, executing step 713;
step 710, calculating the total number of the image examinations of the patient with the typical characteristics of liver cancer, if the total number is greater than or equal to 1 item, executing step 711, otherwise executing step 712;
step 711, setting the analysis result as primary liver cancer, and skipping to step 722 to continue execution;
step 712, counting the total number of the benign features of the patient nodule, and skipping to step 714 to continue execution;
step 713, calculating the total number of the image examinations of the patient with the typical characteristics of the liver cancer, if the total number is greater than or equal to 2 items, skipping to step 711 to continue execution, otherwise, skipping to step 712 to continue execution;
step 714, judging whether the benign tumor feature quantity is greater than or equal to 1 item, if so, executing step 715, otherwise, executing step 716;
step 715, setting the analysis result as benign tumor, recommending the patient to carry out 1 time of ultrasonic and serum AFP detection every 6 months, and skipping to step 722 to continue execution;
step 716, determining whether the enhanced MRI check is completed, if not, executing step 717, otherwise, executing step 718;
step 717, setting the analysis result as a patient recommendation for carrying out the enhanced MRI examination, and skipping to step 722 for continuing execution;
step 718, determining whether the ultrasound contrast examination is completed, if so, executing step 719, otherwise, executing step 721;
step 719, judging whether the EOB-MRI check is completed, if so, executing step 706, otherwise, executing step 720;
step 720, setting an analysis result to suggest that the patient carries out a general beauty examination, and jumping to step 722 to continue execution;
step 721, setting the analysis result as suggesting the patient to carry out the ultrasound contrast examination, and skipping to step 722 to continue execution;
and step 722, returning an analysis result, and ending the analysis flow.
According to the technical scheme, the method can efficiently and automatically analyze the hepatocellular carcinoma condition of each patient and give diagnosis and treatment guide suggestions based on the established primary liver cancer data platform according to the preset analysis and judgment rules, assists doctors in diagnosing and treating liver cancer, and can improve the working efficiency and the diagnosis accuracy.
Preferably, on the basis of obtaining the analysis result, in order to facilitate the query of medical staff and patients, the data analysis method based on primary liver cancer big data according to the present application may further include the following steps:
and step 8, displaying the disease analysis result on a terminal interface, wherein the display interface is shown in fig. 7.
In a preferred embodiment, the present disclosure further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs are executable by one or more processors to implement any of the above steps of the data analysis method based on primary liver cancer big data.
The above embodiments are merely illustrative of the technical solutions of the present invention, and the present invention is not limited to the above embodiments, and any modifications or alterations according to the principles of the present invention should be within the protection scope of the present invention.

Claims (8)

1. The data analysis method based on the primary liver cancer big data is characterized by comprising the following steps:
step 1, acquiring analysis data from an electronic medical record system, an image acquisition and output system and a laboratory examination information system; the analysis data comprises population information, clinical clinic visit information, inspection reports and image reports; the population information comprises outpatient service population information and hospitalization population information;
step 2, preprocessing the outpatient population information and the hospitalization population information to obtain unique index number information of the patient;
step 3, preprocessing the image report; the image report comprises an ultrasonic report, a CT report, a magnetic resonance report and a Pumei magnetic resonance report;
step 4, performing structured analysis processing on the image report to obtain a nodule attribute; the nodule attributes include a nodule property, a nodule feature, and a nodule size;
step 5, preprocessing the inspection report to obtain alpha fetoprotein inspection data;
step 6, summarizing the patient index number information, the image report and the alpha fetoprotein test data to obtain summarized data;
and 7, analyzing the summarized data according to a preset rule to obtain an illness state analysis result of each patient.
2. The data analysis method based on primary liver cancer big data as claimed in claim 1, wherein the preprocessing of the outpatient service population information and the hospitalization population information to obtain the patient index number information comprises the following steps:
step 21, carrying out data verification on the outpatient service population information and the hospitalization population information;
step 22, combining the outpatient service population information and the hospitalization population information and filtering repeated data;
step 23, creating a unique index number for each patient by adopting a Hash algorithm;
and 24, associating the unique index number of the patient with the clinical visit information to form patient index number information.
3. The method for analyzing data based on primary liver cancer big data as claimed in claim 1, wherein said preprocessing said image report comprises the following steps:
step 31, cleaning up error data in the image report;
step 32, screening and filtering the data in the image report;
and step 33, splitting the image report into an ultrasonic report, a CT report, a magnetic resonance report and a Pumei display magnetic resonance report according to the scanning mode of the image report.
4. The data analysis method based on primary liver cancer big data as claimed in claim 1, wherein said structural analysis processing of said image report to obtain nodule attributes comprises the following steps:
step 41, analyzing the properties and characteristics of the nodules from the inspection results of the image reports;
step 42, analyzing the size of the nodule from the inspection result of the image report;
step 43, the nodule properties, the nodule characteristics and the nodule size are associated and saved into the image report.
5. The method for analyzing data based on primary liver cancer big data as claimed in claim 1, wherein said preprocessing said test report to obtain alpha fetoprotein test data comprises the following steps:
step 51, cleaning up error data in the inspection report;
and step 52, screening the data in the test report to filter out the alpha-fetoprotein test data.
6. The method for analyzing data based on primary liver cancer big data according to claim 1, wherein the step of analyzing the summarized data according to a preset rule to obtain the disease analysis result of each patient comprises the following steps:
step 701, judging whether the patient has image data, if so, executing step 702, otherwise, skipping to step 703 to continue executing;
step 702, judging whether a solid nodule exists according to the nodule property of the image data, if so, skipping to step 709 to continue execution, otherwise, skipping to step 704 to continue execution;
step 703, setting the analysis result as a patient recommended to carry out ultrasonic examination, and jumping to step 722 to continue execution;
step 704, judging whether the alpha fetoprotein detection result is positive, if so, executing step 705, otherwise, skipping to step 708 to continue executing;
step 705, judging whether the MRI image and the CT image are finished, if so, executing step 706, otherwise, skipping to step 707 to continue executing;
step 706, setting the analysis result to suggest that the patient carries out alpha fetoprotein detection and image follow-up for 2-3 months, and skipping to step 722 to continue execution;
step 707, setting the analysis result to suggest that the patient performs the review after completing the MRI image and the CT image, and skipping to step 722 to continue the execution;
step 708, setting the analysis result to suggest the patient to follow-up for 6 months, and jumping to point to step 722 to continue execution;
step 709, counting the maximum diameters of all the image nodules, if the maximum diameter of the nodule is larger than 2 cm, executing step 710, otherwise, executing step 713;
step 710, calculating the total number of the image examinations of the patient with the typical characteristics of liver cancer, if the total number is greater than or equal to 1 item, executing step 711, otherwise executing step 712;
step 711, setting the analysis result as primary liver cancer, and skipping to step 722 to continue execution;
step 712, counting the total number of the benign features of the patient nodule, and skipping to step 714 to continue execution;
step 713, calculating the total number of the image examinations of the patient with the typical characteristics of the liver cancer, if the total number is greater than or equal to 2 items, skipping to step 711 to continue execution, otherwise, skipping to step 712 to continue execution;
step 714, judging whether the benign tumor feature quantity is greater than or equal to 1 item, if so, executing step 715, otherwise, executing step 716;
step 715, setting the analysis result as benign tumor, recommending the patient to carry out 1 time of ultrasonic and serum AFP detection every 6 months, and skipping to step 722 to continue execution;
step 716, determining whether the enhanced MRI check is completed, if not, executing step 717, otherwise, executing step 718;
step 717, setting the analysis result as a patient recommendation for carrying out the enhanced MRI examination, and skipping to step 722 for continuing execution;
step 718, determining whether the ultrasound contrast examination is completed, if so, executing step 719, otherwise, executing step 721;
step 719, judging whether the EOB-MRI check is completed, if so, executing step 706, otherwise, executing step 720;
step 720, setting an analysis result to suggest that the patient carries out a general beauty examination, and jumping to step 722 to continue execution;
step 721, setting the analysis result as suggesting the patient to carry out the ultrasound contrast examination, and skipping to step 722 to continue execution;
and step 722, returning an analysis result, and ending the analysis flow.
7. The data analysis method based on the primary liver cancer big data, according to claim 1, is characterized by comprising the following steps:
and step 8, displaying the disease condition analysis result on a terminal interface.
8. A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method for primary liver cancer big data based data analysis according to any one of claims 1 to 7.
CN202110047657.3A 2021-01-14 2021-01-14 Data analysis method based on primary liver cancer big data and storage medium Pending CN112712899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110047657.3A CN112712899A (en) 2021-01-14 2021-01-14 Data analysis method based on primary liver cancer big data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110047657.3A CN112712899A (en) 2021-01-14 2021-01-14 Data analysis method based on primary liver cancer big data and storage medium

Publications (1)

Publication Number Publication Date
CN112712899A true CN112712899A (en) 2021-04-27

Family

ID=75549028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110047657.3A Pending CN112712899A (en) 2021-01-14 2021-01-14 Data analysis method based on primary liver cancer big data and storage medium

Country Status (1)

Country Link
CN (1) CN112712899A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019076830A1 (en) * 2017-10-16 2019-04-25 Biopredictive Method of prognosis and follow up of primary liver cancer
CN110993098A (en) * 2019-12-06 2020-04-10 高春芳 Establishment and application of novel blood multi-index liver cancer diagnosis model (GAP-TALAD)
CN111681737A (en) * 2020-05-07 2020-09-18 陈�峰 Structured report system and method for constructing liver cancer image database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019076830A1 (en) * 2017-10-16 2019-04-25 Biopredictive Method of prognosis and follow up of primary liver cancer
CN110993098A (en) * 2019-12-06 2020-04-10 高春芳 Establishment and application of novel blood multi-index liver cancer diagnosis model (GAP-TALAD)
CN111681737A (en) * 2020-05-07 2020-09-18 陈�峰 Structured report system and method for constructing liver cancer image database

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
中华人民共和国国家卫生健康委员会医政医管局: "原发性肝癌诊疗规范(2019年版)", 《中国实用外科杂志》 *
刘英平等: "甲胎蛋白与癌胚抗原在原发性肝癌与转移性肝癌鉴别诊断中的意义分析", 《中国实验诊断学》 *
王垒 等: "原发性肝癌大数据建设初步探索", 《中华肝胆外科杂志》 *
王垒等: "基于大数据平台的肝细胞癌自动化中国分期模型研究", 《中华肝脏外科手术学电子杂志》 *

Similar Documents

Publication Publication Date Title
Wu et al. Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents
CN108831559B (en) Chinese electronic medical record text analysis method and system
CN110335665B (en) Image searching method and system applied to medical image auxiliary diagnosis analysis
JP5952835B2 (en) Imaging protocol updates and / or recommenders
CN112712879B (en) Information extraction method, device, equipment and storage medium for medical image report
CN112614565A (en) Traditional Chinese medicine classic famous prescription intelligent recommendation method based on knowledge-graph technology
US20210057058A1 (en) Data processing method, apparatus, and device
Bai et al. Thyroid nodules risk stratification through deep learning based on ultrasound images
US11062448B2 (en) Machine learning data generation support apparatus, operation method of machine learning data generation support apparatus, and machine learning data generation support program
US20220285011A1 (en) Document creation support apparatus, document creation support method, and program
CN111061835B (en) Query method and device, electronic equipment and computer readable storage medium
EP2656243B1 (en) Generation of pictorial reporting diagrams of lesions in anatomical structures
Yamashita et al. Automated identification and measurement extraction of pancreatic cystic lesions from free-text radiology reports using natural language processing
Zhang et al. Comparison of chest radiograph captions based on natural language processing vs completed by radiologists
Tabatabaei et al. Towards More Transparent and Accurate Cancer Diagnosis with an Unsupervised CAE Approach
Yurasakpong et al. Anatomical variants identified on chest computed tomography of 1000+ COVID‐19 patients from an open‐access dataset
JP7473314B2 (en) Medical information management device and method for adding metadata to medical reports
Chen et al. Automatically structuring on Chinese ultrasound report of cerebrovascular diseases via natural language processing
Zhang et al. Utility of a rule-based algorithm in the assessment of standardized reporting in PI-RADS
US11978274B2 (en) Document creation support apparatus, document creation support method, and document creation support program
Smischney et al. Retrospective derivation and validation of a search algorithm to identify emergent endotracheal intubations in the intensive care unit
CN112712899A (en) Data analysis method based on primary liver cancer big data and storage medium
CN114201613A (en) Test question generation method, test question generation device, electronic device, and storage medium
Wang et al. Deep learning for automating the organization of institutional dermatology image stores
Tafavvoghi et al. Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210427

RJ01 Rejection of invention patent application after publication