CN110096495A - Accurate medicine big data analysis processing system - Google Patents

Accurate medicine big data analysis processing system Download PDF

Info

Publication number
CN110096495A
CN110096495A CN201910219554.3A CN201910219554A CN110096495A CN 110096495 A CN110096495 A CN 110096495A CN 201910219554 A CN201910219554 A CN 201910219554A CN 110096495 A CN110096495 A CN 110096495A
Authority
CN
China
Prior art keywords
data
medical
medical data
distributed
hadoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910219554.3A
Other languages
Chinese (zh)
Inventor
明炬
杨峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Mingyangda Data Technology Co Ltd
Original Assignee
Wuhan Mingyangda Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Mingyangda Data Technology Co Ltd filed Critical Wuhan Mingyangda Data Technology Co Ltd
Priority to CN201910219554.3A priority Critical patent/CN110096495A/en
Publication of CN110096495A publication Critical patent/CN110096495A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Quality & Reliability (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The present invention provides accurate medicine big data analysis processing systems, it include: to be acquired to medical data, and data cleansing is carried out to collected medical data, the medical data of acquisition is examined and be verified, deleting duplicated data, correct wrong data, then data conversion is carried out, the structure of medical data to be converted to the data mode for meeting memory requirement, the medical data for finally completing data conversion is transmitted to primary database or distributed data base or Hadoop subsystem;Search window data and data visualization window are provided, user inputs the related text for needing the medical data searched in search window data, analysis and processing module analyzes and determines the content of text, medical data is transferred in selection from primary database or distributed data base or Hadoop subsystem, and the medical data is shown in data visualization window.

Description

Accurate medicine big data analysis processing system
Technical field
The present invention relates to big data fields, and in particular to accurate medicine big data analysis processing system.
Background technique
Medicine big data analysis processing system is the big data processing platform that enterprise-level calculates key technology, is counted to using big According to processing come recognize tumour hereditary feature and morbidity molecule mechanism, and to the individuality of cancer prevention or treatment zone come it is powerful and Accurately guidance, the data that can effectively solve clinical data and group data are difficult to the difficulty merged.However China is at present still There are not similar accurate medicine big data management and shared platform, therefore the retrieval mode of accurate medical data is limited System, and the data retrieved often have gaps and omissions, do not improve or data relationship it is indefinite, bring many inconvenience.
Summary of the invention
The technical problem to be solved in the present invention is that the data for learning data for above-mentioned current clinical data and group are difficult to melt The technical issues of conjunction, provides accurate medicine big data analysis processing system and solves above-mentioned technological deficiency.
Accurate medicine big data analysis processing system, comprising:
Data acquisition module: carrying out data cleansing for being acquired to medical data, and to collected medical data, The medical data of acquisition is examined and be verified, deleting duplicated data, wrong data is corrected, then carries out data conversion, The structure of medical data to be converted to the data mode for meeting memory requirement, the medicine number of finally completing data conversion According to being transmitted to primary database or distributed data base or Hadoop subsystem;Collected medical data includes structured medical number According to, semi-structured medical data and unstructured medical data;
Distributed data base: distributed structured in the medical data for passing through data cleansing and data conversion for storing Medical data, and distributed computing, data depth analysis and data mining are carried out to distributed structured medical data, it will divide Cloth structured medical data are associated and summarize, and the medical data collection after being associated with and summarizing can be exported to primary database It closes;
Primary database: for storing the medical data Jing Guo data cleansing and data conversion;Half structure in primary database Progress Hadoop processing, structured medical in Hadoop subsystem can be loaded by changing medical data and unstructured medical data Data can be loaded into distributed data base and be stored;
Hadoop subsystem: for storing the semi-structured medicine in the medical data for passing through data cleansing and data conversion Data and unstructured medical data, and the semi-structured medical data and unstructured medical data are carried out at Hadoop Reason, obtains new structured medical data and is loaded into distributed data base, structured medical data do not need then to handle, directly It is loaded into distributed data base;
Man-machine interactive platform: for providing search window data and data visualization window, user is in search window data The related text for the medical data that middle input needs to search for, analysis and processing module analyze and determine the content of text, select It selects and transfers medical data from primary database or distributed data base or Hadoop subsystem, and shown in data visualization window Show the medical data;
Analysis and processing module: the classification of the medical data for transferring needed for judging is selected from primary database, distributed number According to transferring medical data relevant to the input text, and maintenance data in one or more of library and Hadoop subsystem Mining algorithm carries out data mining in primary database, distributed data base and Hadoop subsystem, complete accurate to transfer Medical data.
Further, data acquisition module is acquired medical data using data warehouse technology ETL.
Further, distributed data base stores ODS storage by data cleansing and data conversion by operation data Distributed structured medical data in medical data, and data storage supports PB grades.
Further, primary database does not have the data type, data structure, data storage method of the medical data of storage It is required that.
Further, Hadoop subsystem can either carry out Hadoop processing to the medical data of itself storage, can also add The medical data carried in primary database carries out Hadoop processing.
Further, structured medical data include between various disease datas, drug data, treatment data and data Relationship, semi-structured medical data include image data, and unstructured medical data includes gene data.
Further, the data mining algorithm used in analysis and processing module includes artificial neural network, decision tree ID3 calculation Method, aggregation, RSL language in rough set.
Present invention has an advantage that this patent is directed to the characteristics of accurate medicine big data and big data storage method and data Transmission mode provides a medical big data storage and analysis system, and has search and the visual function of data, while right Using depth data excavation and machine learning, efficient, profession, accurate technological guidance sum number can be provided accurate medical application According to analysis.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the present invention precisely medicine big data analysis processing system structure chart.
Specific embodiment
For a clearer understanding of the technical characteristics, objects and effects of the present invention, now control attached drawing is described in detail A specific embodiment of the invention.
Accurate medicine big data analysis processing system, as shown in Figure 1, comprising:
Data acquisition module: for being acquired using ETL to medical data, and collected medical data is counted According to cleaning, the medical data of acquisition is examined and be verified, deleting duplicated data, wrong data is corrected, then counted According to conversion, the structure of medical data is converted to the data mode for meeting memory requirement, finally completes to obtain by data conversion Medical data be transmitted to primary database or distributed data base or Hadoop subsystem;Collected medical data includes structure Change medical data, semi-structured medical data and unstructured medical data.
ETL is the abbreviation of data warehouse technology (Extract-Transform-Load), is a kind of method of data processing, Data are extracted (extract) from data source, then convert (transform), (load) is loaded and arrives destination, the purpose is to Utilize purpose end data parallel processing ability.
Distributed data base: for passing through the distribution in the medical data of data cleansing and data conversion by ODS storage Formula structured medical data, and distributed computing, data depth analysis and data are carried out to distributed structured medical data and are dug Pick, distributed structured medical data is associated and is summarized, such as by the pathological information and treatment method of a kind of disease It is associated, and the medical data set after being associated with and summarizing can be exported to primary database;Data storage supports PB grades.
ODS (Operation Data Store) is operation storing data, since Based Data Warehouse System is very complicated, data Source type is different, and position, format of storage etc. are also different, and it is very difficult for carrying out data pick-up to these data.ODS is used In storage source data, data structure, the logical relation of these data are all almost the same, therefore can greatly reduce and extract data Complexity.
Primary database: for storing the medical data Jing Guo data cleansing and data conversion, primary database is general data Library does not require the data type, data structure, data storage method of the medical data of storage;Half hitch in primary database Structure medical data and unstructured medical data can be loaded into progress Hadoop processing in Hadoop subsystem, structuring doctor Data, which can be loaded into distributed data base, to be stored.
Hadoop subsystem: for storing the semi-structured medicine in the medical data for passing through data cleansing and data conversion Data and unstructured medical data, and the semi-structured medical data and unstructured medical data are carried out at Hadoop Reason, obtains new structured medical data and is loaded into distributed data base, structured medical data do not need then to handle, directly It is loaded into distributed data base;Structured medical data mainly include various disease datas, drug data, treatment data and data Between relationship, semi-structured medical data mainly includes image data, and unstructured medical data mainly includes gene data.
Man-machine interactive platform: for providing search window data and data visualization window, user is in search window data The related text for the medical data that middle input needs to search for, analysis and processing module analyze and determine the content of text, select It selects and transfers medical data from primary database or distributed data base or Hadoop subsystem, and shown in data visualization window Show the medical data.
Analysis and processing module: the classification of the medical data for transferring needed for judging is selected from primary database, distributed number According to transferring medical data relevant to the input text, and maintenance data in one or more of library and Hadoop subsystem Algorithm (such as artificial neural network, decision tree ID3 algorithm, aggregation, RSL language etc. in rough set) in excavation primary database, Data mining is carried out in distributed data base and Hadoop subsystem, to ensure to transfer complete accurate medical data.
When in use, user searches for " clinic " in the search window data on man-machine interactive platform to this system, at analysis Reason module will analyze the classification in relation to clinical medical data, and under normal circumstances, clinical data includes that clear data (such as join by sign Number, result of laboratory test), clinical image (such as inspection result of B ultrasound, CT, MRT medical imaging devices), text information is (such as patient Identity record, symptom description, detection and diagnosis result character express) and genetic test information (patient part gene mutation, again Situations such as arranging and expanding) etc..Analysis and processing module transfers clear data and text information from primary database and distributed data base Deng transferring clinical image and genetic test information etc. from Hadoop subsystem, then summarize to the medical data transferred It arranges, is finally shown in the data visualization window by summarized results on man-machine interactive platform.It is stored in primary database Medical data and distributed data base in the data content of medical data that stores it is identical, the difference of two databases mainly exists In the storage mode of data, distributed data base has multiple advantages: flexible architecture, and adapts to distributed management And control mechanism, economic performance is superior, and high reliablity, availability are good, and the fast response time of topical application, scalability is good, easily In integrated existing system;But still have that communication overhead is big, access structure is complicated, the safety of data and confidentiality are difficult to deal with Disadvantage.Therefore this system is actual in use, suitable database (primary database or distribution can be chosen according to actual needs Database) type.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (7)

1. accurate medicine big data analysis processing system characterized by comprising
Data acquisition module: data cleansing is carried out for being acquired to medical data, and to collected medical data, with right The medical data of acquisition is examined and is verified, deleting duplicated data, corrects wrong data, and data conversion is then carried out, will The structure of medical data is converted to the data mode for meeting memory requirement, and the medical data for finally completing data conversion passes Transport to primary database or distributed data base or Hadoop subsystem;Collected medical data include structured medical data, Semi-structured medical data and unstructured medical data;
Distributed data base: for storing the distributed structured medicine in the medical data for passing through data cleansing and data conversion Data, and distributed computing, data depth analysis and data mining are carried out to distributed structured medical data, it will be distributed Structured medical data are associated and summarize, and the medical data set after being associated with and summarizing can be exported to primary database;
Primary database: for storing the medical data Jing Guo data cleansing and data conversion;Semi-structured doctor in primary database Progress Hadoop processing, structured medical data in Hadoop subsystem can be loaded by learning data and unstructured medical data It can be loaded into distributed data base and be stored;
Hadoop subsystem: for storing the semi-structured medical data in the medical data for passing through data cleansing and data conversion With unstructured medical data, and Hadoop processing is carried out to the semi-structured medical data and unstructured medical data, It obtains new structured medical data and is loaded into distributed data base, structured medical data do not need then to handle, and directly add It is downloaded to distributed data base;
Man-machine interactive platform: for providing search window data and data visualization window, user is defeated in search window data Enter the related text of medical data for needing to search for, analysis and processing module analyzes and determines the content of text, selection from Medical data is transferred in primary database or distributed data base or Hadoop subsystem, and institute is shown in data visualization window State medical data;
Analysis and processing module: the classification of the medical data for transferring needed for judging is selected from primary database, distributed data base With medical data relevant to the input text is transferred in one or more of Hadoop subsystem, and maintenance data excavates Algorithm carries out data mining in primary database, distributed data base and Hadoop subsystem, to transfer complete accurate medicine Data.
2. accurate medicine big data analysis processing system according to claim 1, which is characterized in that data acquisition module is adopted Medical data is acquired with data warehouse technology ETL.
3. accurate medicine big data analysis processing system according to claim 1, which is characterized in that distributed data base is logical Operation data storage ODS storage is crossed by the distributed structured medicine number in the medical data of data cleansing and data conversion According to, and data storage supports PB grades.
4. accurate medicine big data analysis processing system according to claim 1, which is characterized in that primary database is to storage The data type of medical data, data structure, data storage method do not require.
5. accurate medicine big data analysis processing system according to claim 1, which is characterized in that Hadoop subsystem was both Hadoop processing can be carried out to the medical data of itself storage, the medical data that can also load in primary database carries out Hadoop Processing.
6. accurate medicine big data analysis processing system according to claim 1, which is characterized in that structured medical data Including the relationship between various disease datas, drug data, treatment data and data, semi-structured medical data includes image number According to unstructured medical data includes gene data.
7. accurate medicine big data analysis processing system according to claim 1, which is characterized in that in analysis and processing module The data mining algorithm of use includes artificial neural network, decision tree ID3 algorithm, aggregation, RSL language in rough set.
CN201910219554.3A 2019-03-22 2019-03-22 Accurate medicine big data analysis processing system Pending CN110096495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910219554.3A CN110096495A (en) 2019-03-22 2019-03-22 Accurate medicine big data analysis processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910219554.3A CN110096495A (en) 2019-03-22 2019-03-22 Accurate medicine big data analysis processing system

Publications (1)

Publication Number Publication Date
CN110096495A true CN110096495A (en) 2019-08-06

Family

ID=67443306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910219554.3A Pending CN110096495A (en) 2019-03-22 2019-03-22 Accurate medicine big data analysis processing system

Country Status (1)

Country Link
CN (1) CN110096495A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111181786A (en) * 2019-12-30 2020-05-19 杭州东方通信软件技术有限公司 User feedback fault information processing method, device, server and storage medium
CN111324671A (en) * 2020-03-02 2020-06-23 苏州工业园区洛加大先进技术研究院 Biomedical high-speed information processing and analyzing system based on big data technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116643A (en) * 2013-02-25 2013-05-22 江苏物联网研究发展中心 Hadoop-based intelligent medical data management method
US20140169645A1 (en) * 2012-12-14 2014-06-19 Advanced Medical Imaging Development S.R.L. Method and system for medical imaging data management
CN105279375A (en) * 2015-10-22 2016-01-27 杭州电子科技大学 Regional medical image storage system based on Hadoop
CN105718732A (en) * 2016-01-20 2016-06-29 华中科技大学同济医学院附属协和医院 Medical data collection and analysis method and system
CN108182963A (en) * 2017-12-14 2018-06-19 山东浪潮云服务信息科技有限公司 A kind of medical data processing method and processing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140169645A1 (en) * 2012-12-14 2014-06-19 Advanced Medical Imaging Development S.R.L. Method and system for medical imaging data management
CN103116643A (en) * 2013-02-25 2013-05-22 江苏物联网研究发展中心 Hadoop-based intelligent medical data management method
CN105279375A (en) * 2015-10-22 2016-01-27 杭州电子科技大学 Regional medical image storage system based on Hadoop
CN105718732A (en) * 2016-01-20 2016-06-29 华中科技大学同济医学院附属协和医院 Medical data collection and analysis method and system
CN108182963A (en) * 2017-12-14 2018-06-19 山东浪潮云服务信息科技有限公司 A kind of medical data processing method and processing device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
常强等: "基于Hadoop平台的氢分子生物医学数据仓库的分析与实现", 《电子技术与软件工程》 *
李伟等: "精准医学大数据平台关键技术研究", 《医疗卫生装备》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111181786A (en) * 2019-12-30 2020-05-19 杭州东方通信软件技术有限公司 User feedback fault information processing method, device, server and storage medium
CN111181786B (en) * 2019-12-30 2022-06-10 杭州东方通信软件技术有限公司 User feedback fault information processing method, device, server and storage medium
CN111324671A (en) * 2020-03-02 2020-06-23 苏州工业园区洛加大先进技术研究院 Biomedical high-speed information processing and analyzing system based on big data technology

Similar Documents

Publication Publication Date Title
CN110993064B (en) Deep learning-oriented medical image labeling method and device
Decenciere et al. TeleOphta: Machine learning and image processing methods for teleophthalmology
Morris et al. DIVA: a visualization system for exploring document databases for technology forecasting
Miao et al. Extracting data records from the web using tag path clustering
Wu et al. Heart disease prediction using data mining techniques
CN109934415B (en) Perioperative critical event prediction method based on cross-modal deep learning
Zubi et al. Using some data mining techniques for early diagnosis of lung cancer
US11495330B2 (en) Neurological data processing
Conrad et al. CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning
Khan et al. Remote diagnosis and triaging model for skin cancer using EfficientNet and extreme gradient boosting
Ye et al. Glioma grading based on 3D multimodal convolutional neural network and privileged learning
CN108962394B (en) Medical data decision support method and system
Cao et al. Solarmap: Multifaceted visual analytics for topic exploration
CN110096495A (en) Accurate medicine big data analysis processing system
Attallah et al. Intelligent dermatologist tool for classifying multiple skin cancer subtypes by incorporating manifold radiomics features categories
CN106228000A (en) Over-treatment detecting system and method
Attallah A deep learning-based diagnostic tool for identifying various diseases via facial images
Chen et al. Vessel tree extraction using radius-lifted keypoints searching scheme and anisotropic fast marching method
Chauhan et al. A robust model for big healthcare data analytics
Pérez et al. Automated detection of lung nodules with three-dimensional convolutional neural networks
JP2007066202A (en) Data analysis program
Diwani et al. A novel holistic disease prediction tool using best fit data mining techniques
Grüger et al. Process Mining for Case Acquisition in Oncology: A Systematic Literature Review.
Zhao et al. Pattern discovery: A progressive visual analytic design to support categorical data analysis
Noei et al. A secure hybrid permissioned blockchain and deep learning platform for CT image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190806

RJ01 Rejection of invention patent application after publication