JP7459386B2 - Disease diagnosis prediction system based on graph neural network - Google Patents

Disease diagnosis prediction system based on graph neural network Download PDF

Info

Publication number
JP7459386B2
JP7459386B2 JP2023536567A JP2023536567A JP7459386B2 JP 7459386 B2 JP7459386 B2 JP 7459386B2 JP 2023536567 A JP2023536567 A JP 2023536567A JP 2023536567 A JP2023536567 A JP 2023536567A JP 7459386 B2 JP7459386 B2 JP 7459386B2
Authority
JP
Japan
Prior art keywords
disease
symptom
patient
graph
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2023536567A
Other languages
Japanese (ja)
Other versions
JP2024503980A (en
Inventor
▲勁▼松 李
▲勝▼▲強▼ 池
宇清 王
雨 田
天舒 周
Original Assignee
之江実験室
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 之江実験室 filed Critical 之江実験室
Publication of JP2024503980A publication Critical patent/JP2024503980A/en
Application granted granted Critical
Publication of JP7459386B2 publication Critical patent/JP7459386B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Description

本発明は、医療健康情報技術分野に属し、特にグラフニューラルネットワークに基づく疾患診断予測システムに関する。 The present invention belongs to the field of medical and health information technology, and particularly relates to a disease diagnosis and prediction system based on graph neural networks.

医療保健分野には、良く整理された知識マップ、例えば、国際疾患分類、DrugBank、臨床ガイド及び共通知識等が非常に多く存在し、それらは、人間の認知に合致する階層情報、複雑な関連関係を有する。知識マップは、様々な関係を含む異種グラフネットワークである。どのように知識マップにおける専門家知識及び電子カルテデータの両方を利用し、知識とデータとを統合してモデル化を行うかは、疾患の診断予測への応用にとって、重要な役割を有する。 In the medical and health field, there are many well-organized knowledge maps such as the International Classification of Diseases, DrugBank, clinical guides, and common knowledge. has. A knowledge map is a heterogeneous graph network containing various relationships. How to use both expert knowledge and electronic medical record data in a knowledge map and integrate knowledge and data to perform modeling plays an important role in application to diagnosis and prediction of diseases.

従来のグラフニューラルネットワークモデルに基づいて疾患予測を行う方法は、医学知識マップと電子カルテデータとを効果的に融合して異種グラフネットワークを構築する方法が欠いている。現在、主な方法は、以下の幾つかの種類を有する。(1)データに基づくグラフネットワークモデル化:電子カルテデータに基づいてグラフネットワークを構築し、グラフニューラルネットワークモデルを利用して疾患予測を行う。当該方法は、従来の医学知識ソースを十分に利用していない。(2)知識表現学習及び疾患予測の段階的なモデル化方法:医学知識マップを学習して知識のベクトル表現を取得してから、電子カルテデータに融合し、疾患予測を行う。段階的なトレーニング方法は、疾患予測に最も適する知識表現を取得することができない。(3)疾患予測タスクのエンドツーエンドモデル化方法のみに注目する:医学知識マップ及び電子カルテデータを融合し、異種グラフネットワークを構築し、グラフニューラルネットワークモデルを利用して疾患予測を行う。当該方法は、上記2種の方法に存在する不足を解決したが、モデルが疾患予測タスクのみを最適化するため、学習された知識がデータ中のノイズの影響を受ける恐れがある。 Conventional disease prediction methods based on graph neural network models lack a method to effectively fuse medical knowledge maps and electronic medical record data to construct a heterogeneous graph network. Currently, the main methods have several types: (1) Graph network modeling based on data: A graph network is constructed based on electronic medical record data, and disease prediction is performed using a graph neural network model. The method does not take full advantage of traditional medical knowledge sources. (2) Stepwise modeling method for knowledge representation learning and disease prediction: After learning a medical knowledge map and obtaining a vector representation of knowledge, it is fused with electronic medical record data to perform disease prediction. Stepwise training methods fail to obtain the most suitable knowledge representation for disease prediction. (3) Focus only on the end-to-end modeling method for disease prediction tasks: fuse medical knowledge maps and electronic medical record data, construct a heterogeneous graph network, and use graph neural network models to perform disease prediction. Although this method solved the deficiencies present in the above two methods, since the model only optimizes the disease prediction task, the learned knowledge may be affected by noise in the data.

本発明は、従来技術の不足について、グラフニューラルネットワークに基づく疾患診断予測システムを提供する。 The present invention provides a disease diagnosis and prediction system based on a graph neural network over the deficiencies of the prior art.

本発明の目的は、以下の解決手段によって達成される。 The object of the invention is achieved by the following solution.

本発明は、グラフニューラルネットワークに基づく疾患診断予測システムを提供する。当該システムは、
(1)医学知識ソースに基づいて疾患―症状知識マップを構築する知識マップ構築モジュールと、
(2)電子カルテシステムから、患者疾患診断及び症状データを含む患者電子カルテデータであってトライグラム形式で格納された患者電子カルテデータを抽出するデータ抽出及び予処理モジュールと、
(3)疾患―症状知識マップ及び電子カルテデータに対してグラフニューラルネットワーク学習及び予測モデル化を行う疾患診断モデル構築モジュールと、
(4)疾患診断モデルを用いて、入力された新患者の症状について疾患診断予測を行う疾患診断モデル応用モジュールと、を備え、
前記グラフニューラルネットワーク学習及び予測モデル化は、異種グラフネットワークの構築と、疾患診断モデルの構築とを含み、
前記異種グラフネットワークは、疾患―症状知識マップから疾患―症状関係を抽出して構築された疾患―症状サブグラフと、トライグラム形式の患者疾患診断及び症状データを用いて構築された患者―症状サブグラフとを含み、
前記疾患診断モデルは、グラフエンコーダとグラフデコーダとの両方によって構成され、
前記グラフエンコーダは、グラフ畳み込みニューラルネットワークを基に実現され、疾患―症状共起行列を用いて得られた疾患、症状、患者のノード初期埋込表現、疾患―症状隣接行列及び患者―症状隣接行列を入力とし、異なるタイプのノードは、接続辺を介して情報を伝送し、ノード埋込表現更新操作によって疾患、症状、患者ノード埋込表現を取得し、グラフデコーダに入力し、
前記グラフデコーダは、ノード埋込表現を用いてマルチタスク学習を行い、前記マルチタスク学習は、以下の部分a)~c)を含み、
a)患者疾患診断予測のマルチラベルな階層分類:疾患の階層構造を用いて疾患層階層関係を構築し、前記疾患層階層関係は、診断予測を行う必要のある疾患層と、医学知識から得られた疾患システム分類層とを含み、マルチラベルな階層分類器を構築し、マルチラベルな階層分類の損失関数を設計し、
b)疾患対比学習:疾患ペアシステム種別判別器を構築し、疾患ペア中の2種の疾患の間の距離を算出し、疾患対比学習の損失関数を設計し、
c)疾患―症状関係学習:疾患―症状関係学習器を構築し、疾患―症状ペア中の疾患と症状とが関連関係を有する確率を算出し、疾患―症状関係学習の損失関数を設計し、
前記マルチラベルな階層分類の損失関数と前記疾患対比学習の損失関数と前記疾患―症状関係学習の損失関数との和を求めて疾患診断モデルの損失関数を取得する。
The present invention provides a disease diagnosis and prediction system based on graph neural networks. The system is
(1) a knowledge map construction module that constructs a disease-symptom knowledge map based on medical knowledge sources;
(2) a data extraction and preprocessing module that extracts patient electronic medical record data that includes patient disease diagnosis and symptom data and is stored in trigram format from the electronic medical record system;
(3) a disease diagnosis model construction module that performs graph neural network learning and predictive modeling on disease-symptom knowledge maps and electronic medical record data;
(4) a disease diagnosis model application module that uses the disease diagnosis model to predict disease diagnosis for input new patient symptoms;
The graph neural network learning and predictive modeling includes constructing a heterogeneous graph network and constructing a disease diagnosis model,
The heterogeneous graph network consists of a disease-symptom subgraph constructed by extracting disease-symptom relationships from a disease-symptom knowledge map, and a patient-symptom subgraph constructed using patient disease diagnosis and symptom data in trigram format. including;
The disease diagnosis model is composed of both a graph encoder and a graph decoder,
The graph encoder is realized based on a graph convolutional neural network, and initial embedded representations of nodes of diseases, symptoms, and patients obtained using a disease-symptom co-occurrence matrix, a disease-symptom adjacency matrix, and a patient-symptom adjacency matrix. As an input, nodes of different types transmit information through connection edges, obtain disease, symptom, and patient node embedding representations through node embedding representation update operations, and input them to the graph decoder,
The graph decoder performs multitask learning using a node embedding representation, and the multitask learning includes the following parts a) to c),
a) Multi-label hierarchical classification of patient disease diagnosis prediction: A disease layer hierarchical relationship is constructed using the disease hierarchical structure, and the disease layer hierarchical relationship is based on the disease layer that requires diagnosis prediction and medical knowledge. construct a multi-label hierarchical classifier, design a multi-label hierarchical classification loss function,
b) Disease contrast learning: Build a disease pair system type classifier, calculate the distance between two diseases in the disease pair, design a loss function for disease contrast learning,
c) Disease-symptom relationship learning: Build a disease-symptom relationship learner, calculate the probability that a disease and symptom in a disease-symptom pair have a related relationship, design a loss function for disease-symptom relationship learning,
The loss function of the disease diagnosis model is obtained by calculating the sum of the loss function of the multi-label hierarchical classification, the loss function of the disease comparison learning, and the loss function of the disease-symptom relationship learning.

更に、前記知識マップ構築モジュールにおいて、前記疾患―症状知識マップは、疾患と症状との2種のノードタイプ、及び、疾患―症状という1種の関係を含む。 Furthermore, in the knowledge map construction module, the disease-symptom knowledge map includes two node types, disease and symptom, and one relationship, disease-symptom.

更に、前記異種グラフネットワークは、疾患―症状知識マップ及び電子カルテデータを基に構築され、疾患と症状と患者との3種のノードタイプを含み、症状は、疾患と患者との間に接続される中間ノードであり、前記異種グラフネットワークには、疾患―症状知識マップのうち疾患、症状に関連する関係サブグラフと、電子カルテデータのうち患者、症状に関連する関係サブグラフとが統合されている。 Furthermore, the heterogeneous graph network is constructed based on the disease-symptom knowledge map and electronic medical record data, and includes three types of nodes: disease, symptom, and patient, and the symptom is connected between the disease and the patient. The heterogeneous graph network integrates relationship subgraphs related to diseases and symptoms in the disease-symptom knowledge map, and relationship subgraphs related to patients and symptoms in electronic medical record data.

更に、前記異種グラフネットワーク

Figure 0007459386000001
は、
Figure 0007459386000002
と示され、ノードセットは、
Figure 0007459386000003
と示され、D、S、Pは、それぞれ所定の疾患セット、症状セット及び患者セットであり、且つ
Figure 0007459386000004
と示され、
Figure 0007459386000005
は、疾患種類、症状種類及び患者数をそれぞれ表し、辺セットは、
Figure 0007459386000006
と示され、セットRは、疾患―症状関係
Figure 0007459386000007
と患者―症状関係
Figure 0007459386000008
とを含み、前記疾患―症状関係は、疾患―症状隣接行列に格納され、前記患者―症状関係は、患者―症状隣接行列に格納されている。 Furthermore, the heterogeneous graph network
Figure 0007459386000001
teeth,
Figure 0007459386000002
and the node set is
Figure 0007459386000003
where D, S, and P are predetermined disease sets, symptom sets, and patient sets, respectively; and
Figure 0007459386000004
It is shown that
Figure 0007459386000005
represent the disease type, symptom type, and number of patients, respectively, and the edge set is
Figure 0007459386000006
and set R is the disease-symptom relationship.
Figure 0007459386000007
and the patient-symptom relationship
Figure 0007459386000008
wherein the disease-symptom relationships are stored in a disease-symptom adjacency matrix, and the patient-symptom relationships are stored in a patient-symptom adjacency matrix.

更に、前記ノード初期埋込表現の生成は、
疾患―症状共起行列

Figure 0007459386000009
を構築する処理であって、行列
Figure 0007459386000010
の第
Figure 0007459386000011
行且つ第
Figure 0007459386000012
列が
Figure 0007459386000013
と記され、電子カルテデータおける疾患
Figure 0007459386000014
と診断された患者のうち症状
Figure 0007459386000015
を発症した患者の数を表す処理と、
Figure 0007459386000016
に対して行の正規化を行って
Figure 0007459386000017
を取得する処理であって、疾患
Figure 0007459386000018
の初期埋込表現が
Figure 0007459386000019
であり、
Figure 0007459386000020
の第
Figure 0007459386000021
行を示す処理と、
Figure 0007459386000022
に対して列の正規化を行って
Figure 0007459386000023
を取得する処理であって、症状
Figure 0007459386000024
の初期埋込表現が
Figure 0007459386000025
であり、
Figure 0007459386000026
の第
Figure 0007459386000027
列を示す処理と、
患者
Figure 0007459386000028
の初期埋込表現
Figure 0007459386000029

Figure 0007459386000030
により求める処理とを含み、
Figure 0007459386000031
は、患者
Figure 0007459386000032
の症状数である。 Furthermore, the generation of the node initial embedding representation is performed by:
Disease-symptom co-occurrence matrix
Figure 0007459386000009
The process of constructing a matrix
Figure 0007459386000010
No.
Figure 0007459386000011
row and first
Figure 0007459386000012
The row is
Figure 0007459386000013
disease in electronic medical record data.
Figure 0007459386000014
Symptoms among patients diagnosed with
Figure 0007459386000015
a process representing the number of patients who developed
Figure 0007459386000016
Perform row normalization for
Figure 0007459386000017
A process to obtain disease
Figure 0007459386000018
The initial embedded representation of
Figure 0007459386000019
and
Figure 0007459386000020
No.
Figure 0007459386000021
Processing to indicate the row,
Figure 0007459386000022
Perform column normalization for
Figure 0007459386000023
The process of acquiring the symptoms
Figure 0007459386000024
The initial embedded representation of
Figure 0007459386000025
and
Figure 0007459386000026
No.
Figure 0007459386000027
Processing to indicate columns,
patient
Figure 0007459386000028
initial embedded representation of
Figure 0007459386000029
of
Figure 0007459386000030
including the processing required by
Figure 0007459386000031
is a patient
Figure 0007459386000032
This is the number of symptoms.

更に、異なるタイプのノード初期埋込表現を1つの多層パーセプトロンにそれぞれ入力し、同じ次元の初期埋込表現を取得してから、グラフエンコーダに入力する。 Moreover, the initial embedding representations of nodes of different types are respectively input into one multilayer perceptron, and the initial embedding representations of the same dimension are obtained before being input into the graph encoder.

更に、前記グラフエンコーダでは、疾患

Figure 0007459386000033
について、第
Figure 0007459386000034
層のノード埋込表現
Figure 0007459386000035
は、
Figure 0007459386000036
にて求められ、
症状
Figure 0007459386000037
について、第
Figure 0007459386000038
層のノード埋込表現
Figure 0007459386000039
は、
Figure 0007459386000040
にて求められ、
患者
Figure 0007459386000041
について、第
Figure 0007459386000042
層のノード埋込表現
Figure 0007459386000043
は、
Figure 0007459386000044
にて求められ、
Figure 0007459386000045
は、活性化関数であり、
Figure 0007459386000046
は、それぞれ第
Figure 0007459386000047
層疾患診断モデルをトレーニングして得られた疾患―症状関連重み行列及び患者―症状関連重み行列であり、
Figure 0007459386000048
は、それぞれ疾患
Figure 0007459386000049
、症状
Figure 0007459386000050
、患者
Figure 0007459386000051
の、第
Figure 0007459386000052
層におけるノード埋込表現であり、
Figure 0007459386000053
は、疾患
Figure 0007459386000054
に隣接する症状ノードのセットを表し、
Figure 0007459386000055
は、症状
Figure 0007459386000056
に隣接する疾患ノードのセットを表し、
Figure 0007459386000057
は、症状
Figure 0007459386000058
に隣接する患者ノードのセットを表し、
Figure 0007459386000059
は、患者
Figure 0007459386000060
に隣接する症状ノードのセットを表す。 Furthermore, in the graph encoder, the disease
Figure 0007459386000033
About, No.
Figure 0007459386000034
Layer node embedding representation
Figure 0007459386000035
teeth,
Figure 0007459386000036
asked for,
symptoms
Figure 0007459386000037
About, No.
Figure 0007459386000038
Layer node embedding representation
Figure 0007459386000039
teeth,
Figure 0007459386000040
asked for,
patient
Figure 0007459386000041
About, No.
Figure 0007459386000042
Layer node embedding representation
Figure 0007459386000043
teeth,
Figure 0007459386000044
asked for,
Figure 0007459386000045
is the activation function,
Figure 0007459386000046
are each
Figure 0007459386000047
A disease-symptom related weight matrix and a patient-symptom related weight matrix obtained by training a layered disease diagnosis model,
Figure 0007459386000048
are each disease
Figure 0007459386000049
, symptoms
Figure 0007459386000050
,patient
Figure 0007459386000051
of, the th
Figure 0007459386000052
is a node embedding representation in a layer,
Figure 0007459386000053
is a disease
Figure 0007459386000054
represents the set of symptom nodes adjacent to
Figure 0007459386000055
are the symptoms
Figure 0007459386000056
represents the set of disease nodes adjacent to
Figure 0007459386000057
are the symptoms
Figure 0007459386000058
represents the set of patient nodes adjacent to
Figure 0007459386000059
is a patient
Figure 0007459386000060
represents the set of symptom nodes adjacent to .

更に、前記グラフデコーダにおいて、前記患者疾患診断予測のマルチラベルな階層分類は、以下のことを含む。
疾患層階層関係を構築し、疾患層の疾患種類を

Figure 0007459386000061
と記し、疾患システム分類層を
Figure 0007459386000062
と記し、
Figure 0007459386000063
は、疾患システム分類数であり、
Figure 0007459386000064
個の二値分類器を含むマルチラベルな階層分類器を構築し、
Figure 0007459386000065
個の二値分類器を
Figure 0007459386000066
(ただし、
Figure 0007459386000067
)と記し、
Figure 0007459386000068
を満たし、
患者
Figure 0007459386000069
のノード埋込表現を
Figure 0007459386000070
個の二値分類器にそれぞれ入力して
Figure 0007459386000071
個の予測確率を取得し、
Figure 0007459386000072
と記し、二値分類器
Figure 0007459386000073
に対応するラベルは、患者の疾患システム分類であり、二値分類器
Figure 0007459386000074
に対応するラベルは、患者の疾患診断であり、対応するモデルパラメータは、
Figure 0007459386000075
であり、
患者
Figure 0007459386000076
が疾患
Figure 0007459386000077
を発症する確率
Figure 0007459386000078

Figure 0007459386000079
により求め、
Figure 0007459386000080
は、二値分類器
Figure 0007459386000081
で予測される、患者が
Figure 0007459386000082
を発症するか否かの確率であり、疾患
Figure 0007459386000083
のシステム分類を
Figure 0007459386000084
とし、
Figure 0007459386000085
は、二値分類器
Figure 0007459386000086
で予測される、患者に疾患システム分類
Figure 0007459386000087
が出現するか否かの確率であり、
マルチラベルな階層分類の損失関数
Figure 0007459386000088
は、
Figure 0007459386000089
Figure 0007459386000090
Figure 0007459386000091
Figure 0007459386000092
にて求められ、
Figure 0007459386000093
は、患者
Figure 0007459386000094
が疾患
Figure 0007459386000095
を発症する実ラベルであり、
Figure 0007459386000096
は、患者
Figure 0007459386000097
の疾患診断に対応する疾患システム分類の実ラベルであり、
Figure 0007459386000098
は、L1ノルムを表し、
Figure 0007459386000099
は、疾患
Figure 0007459386000100
と疾患
Figure 0007459386000101
との間の類似度であり、
Figure 0007459386000102
にて求められ、
Figure 0007459386000103
は、疾患
Figure 0007459386000104
及び疾患
Figure 0007459386000105
の実ラベル分布をそれぞれ表し、
Figure 0007459386000106
Figure 0007459386000107
を満たし、
Figure 0007459386000108

Figure 0007459386000109
は、患者
Figure 0007459386000110
が疾患
Figure 0007459386000111
、疾患
Figure 0007459386000112
を発症する実ラベルをそれぞれ表す。 Further, in the graph decoder, the multi-label hierarchical classification of patient disease diagnosis prediction includes the following.
Build a disease layer hierarchy and identify the disease type in the disease layer.
Figure 0007459386000061
, and the disease system classification layer is
Figure 0007459386000062
written as,
Figure 0007459386000063
is the number of disease system classifications,
Figure 0007459386000064
Build a multi-label hierarchical classifier containing binary classifiers,
Figure 0007459386000065
binary classifiers
Figure 0007459386000066
(however,
Figure 0007459386000067
),
Figure 0007459386000068
The filling,
patient
Figure 0007459386000069
The node embedding representation of
Figure 0007459386000070
input into two binary classifiers respectively.
Figure 0007459386000071
Get the predicted probabilities of
Figure 0007459386000072
, a binary classifier
Figure 0007459386000073
The label corresponding to is the patient's disease system classification, and the binary classifier
Figure 0007459386000074
The label corresponding to is the patient's disease diagnosis, and the corresponding model parameter is
Figure 0007459386000075
and
patient
Figure 0007459386000076
is a disease
Figure 0007459386000077
probability of developing
Figure 0007459386000078
of
Figure 0007459386000079
Obtained by,
Figure 0007459386000080
is a binary classifier
Figure 0007459386000081
predicted that the patient will
Figure 0007459386000082
It is the probability of developing or not developing a disease.
Figure 0007459386000083
system classification of
Figure 0007459386000084
year,
Figure 0007459386000085
is a binary classifier
Figure 0007459386000086
Disease system classification of patients as predicted by
Figure 0007459386000087
is the probability of whether or not appears,
Loss function for multi-label hierarchical classification
Figure 0007459386000088
teeth,
Figure 0007459386000089
Figure 0007459386000090
Figure 0007459386000091
Figure 0007459386000092
asked for,
Figure 0007459386000093
is a patient
Figure 0007459386000094
is a disease
Figure 0007459386000095
It is a real label that develops,
Figure 0007459386000096
is a patient
Figure 0007459386000097
is the actual label of the disease system classification corresponding to the disease diagnosis of
Figure 0007459386000098
represents the L1 norm,
Figure 0007459386000099
is a disease
Figure 0007459386000100
and diseases
Figure 0007459386000101
is the degree of similarity between
Figure 0007459386000102
asked for,
Figure 0007459386000103
is a disease
Figure 0007459386000104
and diseases
Figure 0007459386000105
respectively represent the real label distribution of
Figure 0007459386000106
Figure 0007459386000107
The filling,
Figure 0007459386000108
and
Figure 0007459386000109
is a patient
Figure 0007459386000110
is a disease
Figure 0007459386000111
,disease
Figure 0007459386000112
Each represents a real label that develops.

更に、前記グラフデコーダにおいて、前記疾患対比学習は、以下のことを含む。
疾患セットD中の疾患を2つずつ組み合わせ、疾患ペアセットDDを取得し、疾患ペア数が

Figure 0007459386000113
であり、DD中の何れか1つの疾患ペア
Figure 0007459386000114
に関し、疾患ペアラベルは、2種の疾患が同一のシステム分類に属する場合に、
Figure 0007459386000115
とし、2種の疾患が異なるシステム分類に属する場合に、
Figure 0007459386000116
とし、
疾患ペアシステム種別判別器
Figure 0007459386000117
を構築し、疾患ペア
Figure 0007459386000118
中の2種の疾患のノード埋込表現
Figure 0007459386000119

Figure 0007459386000120
に入力し、2種の疾患の間の距離
Figure 0007459386000121

Figure 0007459386000122
により求め、
Figure 0007459386000123
は、L2ノルムを表し、
疾患対比学習の損失関数
Figure 0007459386000124

Figure 0007459386000125
により求め、mは、異なる疾患システム種別埋込表現の間の距離の下限値である。 Furthermore, in the graph decoder, the disease comparison learning includes the following.
Two diseases in disease set D are combined to obtain disease pair set DD, and the number of disease pairs is
Figure 0007459386000113
and any one disease pair in DD
Figure 0007459386000114
Regarding disease pair labels, when two diseases belong to the same system classification,
Figure 0007459386000115
If two diseases belong to different system classifications,
Figure 0007459386000116
year,
Disease pair system type discriminator
Figure 0007459386000117
construct a disease pair
Figure 0007459386000118
Node embedding representation of two diseases in
Figure 0007459386000119
of
Figure 0007459386000120
and the distance between the two diseases
Figure 0007459386000121
of
Figure 0007459386000122
Obtained by,
Figure 0007459386000123
represents the L2 norm,
Loss function for disease contrast learning
Figure 0007459386000124
of
Figure 0007459386000125
m is the lower limit of the distance between different disease system type embedded representations.

更に、前記グラフデコーダにおいて、前記疾患―症状関係学習は、下記のことを含む。
疾患セットD及び症状セットSから疾患及び症状を1種ずつ選択し、疾患―症状ペアセットDSを取得し、疾患―症状ペア数が

Figure 0007459386000126
であり、DS中の何れか1つの疾患―症状ペア
Figure 0007459386000127
に関し、疾患―症状ペアラベルは、疾患―症状が疾患―症状知識マップにおいて関連関係を有する場合に、
Figure 0007459386000128
とし、疾患―症状が疾患―症状知識マップにおいて関連関係を有さない場合に、
Figure 0007459386000129
とし、
疾患―症状関係学習器
Figure 0007459386000130
を構築し、
Figure 0007459386000131
中の疾患及び症状のノード埋込表現
Figure 0007459386000132

Figure 0007459386000133
に入力し、
Figure 0007459386000134
中の疾患と症状とが関連関係を有する確率
Figure 0007459386000135

Figure 0007459386000136
により求め、
Figure 0007459386000137
は、sigmoid関数を表し、
疾患―症状関係学習の損失関数
Figure 0007459386000138

Figure 0007459386000139
により求める。 Further, in the graph decoder, the disease-symptom relationship learning includes the following.
Select one disease and symptom from disease set D and symptom set S, obtain disease-symptom pair set DS, and calculate the number of disease-symptom pairs.
Figure 0007459386000126
and any one disease-symptom pair in DS
Figure 0007459386000127
Regarding disease-symptom pair labels, when disease-symptoms have a related relationship in the disease-symptom knowledge map,
Figure 0007459386000128
If the disease-symptoms have no relation in the disease-symptom knowledge map,
Figure 0007459386000129
year,
Disease-symptom relationship learning device
Figure 0007459386000130
Build and
Figure 0007459386000131
node-embedded representations of diseases and symptoms in
Figure 0007459386000132
of
Figure 0007459386000133
and enter
Figure 0007459386000134
Probability that there is a relationship between the disease and symptoms in
Figure 0007459386000135
of
Figure 0007459386000136
Obtained by,
Figure 0007459386000137
represents a sigmoid function,
Loss function for disease-symptom relationship learning
Figure 0007459386000138
of
Figure 0007459386000139
Find it by

本発明は、以下の有利な作用効果を有する。本発明では、知識マップにおける専門家知識及び電子カルテデータを有効に統合して異種グラフネットワークを構築する。異種グラフネットワークにおいて、グラフ畳み込みニューラルネットワーク方法を用いて異種グラフネットワークの局所情報及びグローバル情報を学習する。疾患診断モデルは、知識及びデータの両方に対してエンドツーエンドのトレーニングを行うことができる。モデル最適化目標において、疾患予測タスクを最適化するに加えて、知識関係に対する教師情報も追加することにより(疾患対比学習部分及び疾患―症状関係学習部分)、疾患予測タスクが知識を効果的に利用することが確保されるとともに、知識表現がデータノイズの影響を受けないことも確保される。予測疾患数が多くて一部の疾患に対応する患者数が限られる問題について、マルチラベルな階層分類を設計することにより、少ないサンプル種別の疾患の予測効果を向上させる。 The present invention has the following advantageous effects. In the present invention, a heterogeneous graph network is constructed by effectively integrating expert knowledge and electronic medical record data in a knowledge map. In a heterogeneous graph network, a graph convolution neural network method is used to learn local information and global information of a heterogeneous graph network. Disease diagnostic models can be trained end-to-end on both knowledge and data. In the model optimization goal, in addition to optimizing the disease prediction task, we also add teacher information for knowledge relationships (disease comparison learning part and disease-symptom relationship learning part) so that the disease prediction task can effectively use knowledge. It is ensured that the knowledge representation is not affected by data noise. For problems where the number of predicted diseases is large and the number of patients corresponding to some diseases is limited, by designing a multi-label hierarchical classification, we can improve the prediction effect of diseases with a small number of sample types.

本発明の実施例に関わるグラフニューラルネットワークに基づく疾患診断予測システムの構成図である。FIG. 1 is a configuration diagram of a disease diagnosis and prediction system based on a graph neural network according to an embodiment of the present invention. 本発明の実施例に関わる異種グラフネットワークの構成図である。FIG. 1 is a configuration diagram of a heterogeneous graph network according to an embodiment of the present invention. 本発明の実施例に関わる疾患診断モデルの構成図である。FIG. 1 is a configuration diagram of a disease diagnosis model related to an embodiment of the present invention. 本発明の実施例に関わる疾患の階層構造の模式図である。FIG. 2 is a schematic diagram of a hierarchical structure of diseases related to an example of the present invention.

本発明の上記目的、特徴及びメリットがより明白且つ分かりやすくなるように、以下では、図面を参照しながら本発明の具体的な実施形態について詳細に説明する。 In order to make the above objects, features, and advantages of the present invention more clear and comprehensible, specific embodiments of the present invention will be described in detail below with reference to the drawings.

本発明が十分に理解されるように以下の説明において詳細が多く記述されているが、本発明は、更に、ここで記述された形態と異なる形態で実施され得る。当業者は、本発明の要旨に反しない場合に、類似する拡張を行うことができる。したがって、本発明は、以下に開示された具体的な実施例に限定されない。 Although many details are set forth in the following description to provide a thorough understanding of the invention, the invention may be practiced otherwise than as described herein. Those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific examples disclosed below.

本発明の実施例は、グラフニューラルネットワークに基づく疾患診断予測システムを提供する。図1に示すように、当該システムは、知識マップ構築モジュールと、データ抽出及び予処理モジュールと、疾患診断モデル構築モジュールと、疾患診断モデル応用モジュールとを備える。以下では、各モジュールの実施形態を詳細に説明する。 Embodiments of the present invention provide a disease diagnosis and prediction system based on graph neural networks. As shown in FIG. 1, the system includes a knowledge map construction module, a data extraction and preprocessing module, a disease diagnosis model construction module, and a disease diagnosis model application module. Below, embodiments of each module will be described in detail.

知識マップ構築モジュール:SNOMED―CT、HPO等の医学知識ソースに基づいて疾患―症状知識マップを構築し、前記疾患―症状知識マップは、疾患と症状との2種のノードタイプ、及び、疾患―症状という1種の関係を含む。
データ抽出及び予処理モジュール:電子カルテシステムから、患者疾患診断及び症状データを含む患者電子カルテデータであってトライグラム形式で格納された患者電子カルテデータを抽出する。
疾患診断モデル構築モジュール:疾患―症状知識マップ及び電子カルテデータに対してグラフニューラルネットワーク学習及び予測モデル化を行う。
疾患診断モデル応用モジュール:疾患診断モデルを用いて、入力された新患者の症状について疾患診断予測を行う。
Knowledge map construction module: Constructs a disease-symptom knowledge map based on medical knowledge sources such as SNOMED-CT and HPO, and the disease-symptom knowledge map has two node types: disease and symptom, and disease- It includes one type of relationship: symptoms.
Data extraction and preprocessing module: Extracts patient electronic medical record data, which includes patient disease diagnosis and symptom data and is stored in trigram format, from the electronic medical record system.
Disease diagnosis model construction module: Performs graph neural network learning and predictive modeling on disease-symptom knowledge maps and electronic medical record data.
Disease diagnosis model application module: Uses a disease diagnosis model to predict disease diagnosis for input new patient symptoms.

疾患診断モデル構築モジュールの具体的な機能は、所定疾患セット

Figure 0007459386000140
、症状セット
Figure 0007459386000141
及び患者セット
Figure 0007459386000142
である。
Figure 0007459386000143
は、疾患種類、症状種類及び患者数をそれぞれ表す。疾患診断予測は、マルチラベル分類問題と見なされる。即ち、所定患者症状の場合に、疾患診断モデルは、患者の疾患診断を予測することができる。 The specific function of the disease diagnosis model construction module is to
Figure 0007459386000140
, Symptom Set
Figure 0007459386000141
and patient set
Figure 0007459386000142
It is.
Figure 0007459386000143
where x, y, y, y, y, and y denote the disease type, symptom type, and number of patients, respectively. Disease diagnosis prediction is viewed as a multi-label classification problem, i.e., for a given patient symptom, a disease diagnosis model can predict the patient's disease diagnosis.

疾患診断モデルの実現は、以下の(1)~(6)を含む。
(1)異種グラフネットワークの構築
疾患―症状知識マップ及び電子カルテデータを用いて、疾患、症状及び患者の3種のノードタイプを含む異種グラフネットワーク

Figure 0007459386000144
を構築する。症状は、疾患と患者との間に接続される中間ノードである。当該異種グラフネットワークは、疾患―症状知識マップのうち疾患、症状に関連する関係サブグラフと、電子カルテデータのうち患者、症状に関連する関係サブグラフとが統合されており、疾患―症状サブグラフ
Figure 0007459386000145
及び患者―症状サブグラフ
Figure 0007459386000146
を含む。
異種グラフネットワーク
Figure 0007459386000147
は、
Figure 0007459386000148
と示されてもよい。
その中、ノードセットは、
Figure 0007459386000149
と示され、辺セットは、
Figure 0007459386000150
と示され、セットRは、疾患―症状関係
Figure 0007459386000151
及び患者―症状関係
Figure 0007459386000152
を含み、疾患―症状関係は、疾患―症状隣接行列に格納され、患者―症状関係は、患者―症状隣接行列に格納されている。
図2は、異種グラフネットワーク構造の例示であり、4人の患者
Figure 0007459386000153
、4種の疾患
Figure 0007459386000154
、4種の症状
Figure 0007459386000155
、及び患者―症状関係、疾患―症状関係を含む。 The realization of the disease diagnosis model includes the following (1) to (6).
(1) Construction of a heterogeneous graph network Using the disease-symptom knowledge map and electronic medical record data, a heterogeneous graph network including three node types: disease, symptoms, and patients is constructed.
Figure 0007459386000144
A symptom is an intermediate node connected between a disease and a patient. The heterogeneous graph network is an integrated network of disease-symptom knowledge map relationship subgraphs related to diseases and symptoms and electronic medical record data relationship subgraphs related to patients and symptoms.
Figure 0007459386000145
and the patient-symptom subgraph
Figure 0007459386000146
including.
Heterogeneous Graph Networks
Figure 0007459386000147
teeth,
Figure 0007459386000148
It may be shown as follows.
Among them, the node set is
Figure 0007459386000149
and the edge set is
Figure 0007459386000150
and set R is the disease-symptom relationship.
Figure 0007459386000151
and patient-symptom relationship
Figure 0007459386000152
where the disease-symptom relationships are stored in a disease-symptom adjacency matrix and the patient-symptom relationships are stored in a patient-symptom adjacency matrix.
FIG. 2 is an example of a heterogeneous graph network structure, with four patients
Figure 0007459386000153
, 4 types of diseases
Figure 0007459386000154
, 4 types of symptoms
Figure 0007459386000155
, as well as the patient-symptom relationship and the disease-symptom relationship.

(2)サブグラフの構築
疾患―症状サブグラフ

Figure 0007459386000156
:疾患―症状知識マップから疾患―症状関係構築疾患―症状サブグラフを抽出する。
患者―症状サブグラフ
Figure 0007459386000157
:トライグラム形式の患者疾患診断及び症状データを用いて、患者―症状サブグラフを構築する。 (2) Constructing a subgraph Disease-symptom subgraph
Figure 0007459386000156
: Extract disease-symptom relation-constructed disease-symptom subgraphs from the disease-symptom knowledge map.
Patient-Symptoms Subgraph
Figure 0007459386000157
: Construct a patient-symptom subgraph using patient disease diagnosis and symptom data in trigram format.

(3)疾患診断モデル構造
図3は、疾患診断モデル構造の例示である。疾患―症状共起行列を用いて疾患、症状、患者のノード初期埋込表現を取得する。ノード初期埋込表現及び隣接行列を疾患診断モデルの入力とする。疾患診断モデルは、グラフエンコーダ及びグラフデコーダの2つの部分によって構成される。ノード初期埋込表現の生成、グラフエンコーダ及びグラフデコーダの具体的なステップは、(4)~(6)を参照可能である。
(3) Disease diagnosis model structure FIG. 3 is an example of a disease diagnosis model structure. Obtain initial node embedding representations of diseases, symptoms, and patients using a disease-symptom co-occurrence matrix. The node initial embedding representation and adjacency matrix are input to the disease diagnosis model. A disease diagnosis model is composed of two parts: a graph encoder and a graph decoder. For the specific steps of generating the node initial embedded representation, graph encoder, and graph decoder, refer to (4) to (6).

(4)ノード初期埋込表現の生成
まず、疾患―症状共起行列

Figure 0007459386000158
を構築し、行列
Figure 0007459386000159
の第
Figure 0007459386000160
行且つ第
Figure 0007459386000161
列を
Figure 0007459386000162
と記し、電子カルテデータにおいて疾患
Figure 0007459386000163
と診断された患者のうち、症状
Figure 0007459386000164
を発症した数を示す。次に、
Figure 0007459386000165
に対して行の正規化を行って
Figure 0007459386000166
を取得し、疾患
Figure 0007459386000167
の初期埋込表現が
Figure 0007459386000168
、即ち、
Figure 0007459386000169
の第
Figure 0007459386000170
行であり、
Figure 0007459386000171
に対して列の正規化を行って
Figure 0007459386000172
を取得し、症状
Figure 0007459386000173
の初期埋込表現が
Figure 0007459386000174
、即ち、
Figure 0007459386000175
の第
Figure 0007459386000176
列である。その後、患者
Figure 0007459386000177
の初期埋込表現
Figure 0007459386000178

Figure 0007459386000179
により求め、
Figure 0007459386000180
は、患者
Figure 0007459386000181
の症状数である。 (4) Generation of node initial embedding representation First, disease-symptom co-occurrence matrix
Figure 0007459386000158
construct the matrix
Figure 0007459386000159
No.
Figure 0007459386000160
row and first
Figure 0007459386000161
row
Figure 0007459386000162
disease in the electronic medical record data.
Figure 0007459386000163
Among patients diagnosed with
Figure 0007459386000164
Indicates the number of people who developed this disease. next,
Figure 0007459386000165
Perform row normalization for
Figure 0007459386000166
get disease
Figure 0007459386000167
The initial embedded representation of
Figure 0007459386000168
, that is,
Figure 0007459386000169
No.
Figure 0007459386000170
row,
Figure 0007459386000171
Perform column normalization for
Figure 0007459386000172
get symptoms
Figure 0007459386000173
The initial embedded representation of
Figure 0007459386000174
, that is,
Figure 0007459386000175
No.
Figure 0007459386000176
It is a column. Then the patient
Figure 0007459386000177
initial embedded representation of
Figure 0007459386000178
of
Figure 0007459386000179
Obtained by,
Figure 0007459386000180
is a patient
Figure 0007459386000181
This is the number of symptoms.

(5)グラフエンコーダ
まず、異なるタイプのノード初期埋込表現を1つの多層パーセプトロンにそれぞれ入力し、同じ次元の初期埋込表現を取得してから、グラフエンコーダに入力する。グラフエンコーダは、グラフ畳み込みニューラルネットワークに基づいて実現される。
グラフエンコーダにおいて、異なるタイプのノードは、図における接続辺を介して情報を伝送して他のタイプノードの情報を統合してもよい。疾患

Figure 0007459386000182
について、第
Figure 0007459386000183
層のノード埋込表現
Figure 0007459386000184
は、
Figure 0007459386000185
にて求められ、
症状
Figure 0007459386000186
について、第
Figure 0007459386000187
層のノード埋込表現
Figure 0007459386000188
は、
Figure 0007459386000189
にて求められ、
患者
Figure 0007459386000190
について、第
Figure 0007459386000191
層のノード埋込表現
Figure 0007459386000192
は、
Figure 0007459386000193
にて求められ、
Figure 0007459386000194
は、活性化関数であり、
Figure 0007459386000195
は、それぞれ第
Figure 0007459386000196
層疾患診断モデルをトレーニングして得られた疾患―症状関連重み行列及び患者―症状関連重み行列であり、
Figure 0007459386000197
は、それぞれ疾患ノード
Figure 0007459386000198
、症状ノード
Figure 0007459386000199
、患者ノード
Figure 0007459386000200
の、第
Figure 0007459386000201
層におけるノード埋込表現であり、グラフエンコーダの総層数は、
Figure 0007459386000202
である。
Figure 0007459386000203
は、疾患ノード
Figure 0007459386000204
に隣接する症状ノードのセットを表し、
Figure 0007459386000205
は、症状ノード
Figure 0007459386000206
に隣接する疾患ノードのセットを表し、
Figure 0007459386000207
は、症状ノード
Figure 0007459386000208
に隣接する患者ノードのセットを表し、
Figure 0007459386000209
は、患者ノード
Figure 0007459386000210
に隣接する症状ノードのセットを表す。
Figure 0007459386000211

Figure 0007459386000212
は、疾患―症状隣接行列によって取得され、
Figure 0007459386000213

Figure 0007459386000214
は、患者―症状隣接行列によって取得される。上記ノード埋込表現更新操作を
Figure 0007459386000215
回繰り返して実行することにより、関連関係を十分に捉える疾患、症状、患者ノード埋込表現を取得することができる。 (5) Graph Encoder First, initial embedding representations of nodes of different types are each input into one multilayer perceptron, initial embedding representations of the same dimension are obtained, and then input into the graph encoder. The graph encoder is realized based on graph convolutional neural networks.
In a graph encoder, nodes of different types may transmit information via connecting edges in the diagram to integrate information of nodes of other types. disease
Figure 0007459386000182
About, No.
Figure 0007459386000183
Layer node embedding representation
Figure 0007459386000184
teeth,
Figure 0007459386000185
asked for,
symptoms
Figure 0007459386000186
About, No.
Figure 0007459386000187
Layer node embedding representation
Figure 0007459386000188
teeth,
Figure 0007459386000189
asked for,
patient
Figure 0007459386000190
About, No.
Figure 0007459386000191
Layer node embedding representation
Figure 0007459386000192
teeth,
Figure 0007459386000193
asked for,
Figure 0007459386000194
is the activation function,
Figure 0007459386000195
are each
Figure 0007459386000196
A disease-symptom related weight matrix and a patient-symptom related weight matrix obtained by training a layered disease diagnosis model,
Figure 0007459386000197
are disease nodes, respectively.
Figure 0007459386000198
, symptom node
Figure 0007459386000199
, patient node
Figure 0007459386000200
of, the th
Figure 0007459386000201
It is a node embedding representation in a layer, and the total number of layers in the graph encoder is
Figure 0007459386000202
It is.
Figure 0007459386000203
is the disease node
Figure 0007459386000204
represents the set of symptom nodes adjacent to
Figure 0007459386000205
is the symptom node
Figure 0007459386000206
represents the set of disease nodes adjacent to
Figure 0007459386000207
is the symptom node
Figure 0007459386000208
represents the set of patient nodes adjacent to
Figure 0007459386000209
is the patient node
Figure 0007459386000210
represents the set of symptom nodes adjacent to .
Figure 0007459386000211
,
Figure 0007459386000212
is obtained by the disease-symptom adjacency matrix,
Figure 0007459386000213
,
Figure 0007459386000214
is obtained by the patient-symptom adjacency matrix. The above node embedded expression update operation
Figure 0007459386000215
By repeating the process several times, it is possible to obtain disease, symptom, and patient node embedded representations that sufficiently capture the relevant relationships.

(6)グラフデコーダ
グラフエンコーダで取得されたノード埋込表現をグラフデコーダに入力する。グラフデコーダでは、ノード埋込表現を用いてマルチタスク学習を行う。
(6) Graph decoder The node embedding representation obtained by the graph encoder is input to the graph decoder. The graph decoder performs multitask learning using node embedding representations.

第1に、患者疾患診断予測のマルチラベルな階層分類を行う。
まず、図4に示すように、疾患の階層構造を用いて疾患層階層関係を構築する。

Figure 0007459386000216
層は、疾患セットD中の疾患、即ち、診断予測を行う必要のある疾患であり、疾患種類は、上述した通り、
Figure 0007459386000217
であり、
Figure 0007459386000218
層は、医学知識に基づいて疾患に対して行われたシステム分類であり、
Figure 0007459386000219
と記し、
Figure 0007459386000220
は、
Figure 0007459386000221
層の疾患システム分類数である。 First, multi-label hierarchical classification of patient disease diagnosis prediction is performed.
First, as shown in FIG. 4, a disease layer hierarchical relationship is constructed using a disease hierarchical structure.
Figure 0007459386000216
The layer is the disease in the disease set D, that is, the disease for which diagnosis and prediction needs to be performed, and the disease type is as described above.
Figure 0007459386000217
and
Figure 0007459386000218
Tiers are system classifications made for diseases based on medical knowledge,
Figure 0007459386000219
written as,
Figure 0007459386000220
teeth,
Figure 0007459386000221
This is the number of disease system classifications in the layer.

次に、

Figure 0007459386000222
個の二値分類器を含むマルチラベルな階層分類器を構築し、
Figure 0007459386000223
個の二値分類器を
Figure 0007459386000224

Figure 0007459386000225
と記す。患者
Figure 0007459386000226
のノード埋込表現を
Figure 0007459386000227
個の二値分類器にそれぞれ入力し、
Figure 0007459386000228
個の予測確率を取得し、
Figure 0007459386000229
と記す。
Figure 0007459386000230
を満たし、分類器
Figure 0007459386000231
に対応するラベルは、患者の疾患システム分類であり、分類器
Figure 0007459386000232
に対応するラベルは、患者の疾患診断であり、対応するモデルパラメータは、
Figure 0007459386000233
である。
その後、患者
Figure 0007459386000234
が疾患
Figure 0007459386000235
を発症する確率
Figure 0007459386000236
を、
Figure 0007459386000237
により求め、
Figure 0007459386000238
は、二値分類器
Figure 0007459386000239
で予測される、患者が
Figure 0007459386000240
を発症するか否かの確率であり、疾患
Figure 0007459386000241
のシステム分類を
Figure 0007459386000242
とし、
Figure 0007459386000243
は、二値分類器
Figure 0007459386000244
で予測される、患者に疾患システム分類
Figure 0007459386000245
が出現するか否かの確率である。 next,
Figure 0007459386000222
Construct a multi-label hierarchical classifier that includes binary classifiers,
Figure 0007459386000223
binary classifiers
Figure 0007459386000224
,
Figure 0007459386000225
Patient
Figure 0007459386000226
Let us consider the node embedding representation of
Figure 0007459386000227
are input to each binary classifier,
Figure 0007459386000228
Obtain the predicted probabilities
Figure 0007459386000229
It is written as follows.
Figure 0007459386000230
and the classifier
Figure 0007459386000231
The label corresponding to is the disease system classification of the patient, and the classifier
Figure 0007459386000232
The label corresponding to is the patient's disease diagnosis, and the corresponding model parameters are
Figure 0007459386000233
It is.
Then, the patient
Figure 0007459386000234
is a disease
Figure 0007459386000235
Probability of developing
Figure 0007459386000236
of,
Figure 0007459386000237
Calculate by
Figure 0007459386000238
is a binary classifier
Figure 0007459386000239
Predicted by
Figure 0007459386000240
The probability of developing a disease
Figure 0007459386000241
System classification of
Figure 0007459386000242
year,
Figure 0007459386000243
is a binary classifier
Figure 0007459386000244
Predicted disease system classification for patients
Figure 0007459386000245
is the probability of whether or not a

最後に、マルチラベルな階層分類の損失関数

Figure 0007459386000246
は、
Figure 0007459386000247
Figure 0007459386000248
Figure 0007459386000249
Figure 0007459386000250
にて求められ、
Figure 0007459386000251
は、患者
Figure 0007459386000252
が疾患
Figure 0007459386000253
を発症する実ラベルであり、
Figure 0007459386000254
は、患者
Figure 0007459386000255
の疾患診断に対応する疾患システム分類の実ラベルであり、
Figure 0007459386000256
は、L1ノルムを表し、
Figure 0007459386000257
は、疾患
Figure 0007459386000258
と疾患
Figure 0007459386000259
との間の類似度であり、
Figure 0007459386000260
にて求められ、
Figure 0007459386000261
は、疾患
Figure 0007459386000262
及び疾患
Figure 0007459386000263
の実ラベル分布をそれぞれ表し、
Figure 0007459386000264
Figure 0007459386000265
を満たし、
Figure 0007459386000266

Figure 0007459386000267
は、患者
Figure 0007459386000268
が疾患
Figure 0007459386000269
、疾患
Figure 0007459386000270
をそれぞれ発症する実ラベルをそれぞれ表す。 Finally, the loss function for multi-label hierarchical classification
Figure 0007459386000246
teeth,
Figure 0007459386000247
Figure 0007459386000248
Figure 0007459386000249
Figure 0007459386000250
asked for,
Figure 0007459386000251
is a patient
Figure 0007459386000252
is a disease
Figure 0007459386000253
It is a real label that develops,
Figure 0007459386000254
is a patient
Figure 0007459386000255
is the actual label of the disease system classification corresponding to the disease diagnosis of
Figure 0007459386000256
represents the L1 norm,
Figure 0007459386000257
is a disease
Figure 0007459386000258
and diseases
Figure 0007459386000259
is the degree of similarity between
Figure 0007459386000260
asked for,
Figure 0007459386000261
is a disease
Figure 0007459386000262
and diseases
Figure 0007459386000263
respectively represent the real label distribution of
Figure 0007459386000264
Figure 0007459386000265
The filling,
Figure 0007459386000266
and
Figure 0007459386000267
is a patient
Figure 0007459386000268
is a disease
Figure 0007459386000269
,disease
Figure 0007459386000270
represent the actual labels that each occur.

第2に、疾患対比学習を行う。
まず、疾患セットD中の疾患を2つずつ組み合わせ、疾患ペアセットDDを取得し、疾患ペア数が

Figure 0007459386000271
である。DD中の何れか1つの疾患ペア
Figure 0007459386000272
に関し、疾患ペアラベルは、2種の疾患が同一のシステム分類に属する場合に、
Figure 0007459386000273
とし、2種の疾患が異なるシステム分類に属する場合に、
Figure 0007459386000274
を満たす。 Second, disease comparison learning is performed.
First, two diseases in disease set D are combined to obtain disease pair set DD, and the number of disease pairs is
Figure 0007459386000271
It is. Any one disease pair in DD
Figure 0007459386000272
Regarding disease pair labels, when two diseases belong to the same system classification,
Figure 0007459386000273
If two diseases belong to different system classifications,
Figure 0007459386000274
satisfy.

次に、疾患ペアシステム種別判別器

Figure 0007459386000275
を構築し、疾患ペア
Figure 0007459386000276
中の2種の疾患のノード埋込表現
Figure 0007459386000277

Figure 0007459386000278
に入力し、2種の疾患の間の距離
Figure 0007459386000279

Figure 0007459386000280
により求め、
Figure 0007459386000281
は、L2ノルムを表す。 Next, the disease pair system type discriminator
Figure 0007459386000275
construct a disease pair
Figure 0007459386000276
Node embedding representation of two diseases in
Figure 0007459386000277
of
Figure 0007459386000278
and the distance between the two diseases
Figure 0007459386000279
of
Figure 0007459386000280
Obtained by,
Figure 0007459386000281
represents the L2 norm.

最後に、疾患対比学習の損失関数

Figure 0007459386000282

Figure 0007459386000283
により求め、mは、異なる疾患システム種別埋込表現の間の距離の下限値である。 Finally, the loss function for disease contrast learning
Figure 0007459386000282
of
Figure 0007459386000283
where m is the lower bound of the distance between different disease system type embedded representations.

第3に、疾患―症状関係学習を行う。
まず、疾患セットD及び症状セットSから疾患及び症状を1種ずつ選択し、疾患―症状ペアセットDSを取得し、疾患―症状ペア数が

Figure 0007459386000284
である。DS中の何れか1つの疾患―症状ペア
Figure 0007459386000285
に関し、疾患―症状ペアラベルは、疾患―症状が疾患―症状知識マップにおいて関連関係を有する場合に、
Figure 0007459386000286
とし、疾患―症状が疾患―症状知識マップにおいて関連関係を有さない場合に、
Figure 0007459386000287
を満たす。 Third, learn the disease-symptom relationship.
First, select one disease and symptom from disease set D and symptom set S, obtain a disease-symptom pair set DS, and calculate the number of disease-symptom pairs.
Figure 0007459386000284
It is. Any one disease-symptom pair in DS
Figure 0007459386000285
Regarding disease-symptom pair labels, when disease-symptoms have a related relationship in the disease-symptom knowledge map,
Figure 0007459386000286
If the disease-symptoms have no relation in the disease-symptom knowledge map,
Figure 0007459386000287
satisfy.

次に、疾患―症状関係学習器

Figure 0007459386000288
を構築し、
Figure 0007459386000289
中の疾患及び症状のノード埋込表現
Figure 0007459386000290

Figure 0007459386000291
に入力し、
Figure 0007459386000292
中の疾患と症状とが関連関係を有する確率
Figure 0007459386000293

Figure 0007459386000294
により求め、
Figure 0007459386000295
は、sigmoid関数を表し、
疾患―症状関係学習の損失関数
Figure 0007459386000296

Figure 0007459386000297
により求め、疾患診断モデルの損失関数
Figure 0007459386000298
は、
Figure 0007459386000299
のように定義される。 Next, the disease-symptom relationship learning device
Figure 0007459386000288
Build and
Figure 0007459386000289
node-embedded representations of diseases and symptoms in
Figure 0007459386000290
of
Figure 0007459386000291
and enter
Figure 0007459386000292
Probability that there is a relationship between the disease and symptoms in
Figure 0007459386000293
of
Figure 0007459386000294
Obtained by,
Figure 0007459386000295
represents a sigmoid function,
Loss function for disease-symptom relationship learning
Figure 0007459386000296
of
Figure 0007459386000297
The loss function of the disease diagnosis model is calculated by
Figure 0007459386000298
teeth,
Figure 0007459386000299
It is defined as:

上述したのは、本発明の好適な実施形態に過ぎない。本発明が好ましい実施例で上述されたが、これらの実施例は、本発明を限定するものではない。当業者であれば、本発明の技術的解決手段の範囲から逸脱することなく、上記開示された方法及び技術内容を利用して本発明の技術的解決手段に対して多くの可能な変動及び修飾を行い、又は同等変化の等価実施例に修正することができる。したがって、本発明の技術的解決手段の内容から逸脱せず、本発明の技術的思想に基づいて以上の実施例に対して行われたいかなる簡単な修正、同等変化及び修飾は、いずれも依然として本発明の技術的解決手段の保護範囲内に含まれる。 What has been described above are only preferred embodiments of the invention. Although the invention has been described above with preferred embodiments, these embodiments are not intended to limit the invention. Those skilled in the art can make many possible variations and modifications to the technical solution of the present invention using the methods and technical contents disclosed above without departing from the scope of the technical solution of the present invention. or can be modified to equivalent embodiments with equivalent changes. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments based on the technical idea of the present invention without departing from the content of the technical solutions of the present invention will still remain the same. fall within the protection scope of the technical solution of the invention.

Claims (9)

グラフニューラルネットワークに基づく疾患診断予測システムであって、
医学知識ソースに基づいて疾患―症状知識マップを構築する知識マップ構築モジュールと、
電子カルテシステムから、患者疾患診断及び症状データを含む患者電子カルテデータであってトライグラム形式で格納された患者電子カルテデータを抽出するデータ抽出及び予処理モジュールと、
疾患―症状知識マップ及び電子カルテデータに対してグラフニューラルネットワーク学習及び予測モデル化を行う疾患診断モデル構築モジュールと、
疾患診断モデルを用いて、入力された新患者の症状について疾患診断予測を行う疾患診断モデル応用モジュールと、を備え、
前記グラフニューラルネットワーク学習及び予測モデル化は、異種グラフネットワークの構築と、疾患診断モデルの構築とを含み、
前記異種グラフネットワークは、疾患―症状知識マップから疾患―症状関係を抽出して構築された疾患―症状サブグラフと、トライグラム形式の患者疾患診断及び症状データを用いて構築された患者―症状サブグラフとを含み、
前記疾患診断モデルは、グラフエンコーダとグラフデコーダとの両方によって構成され、
前記グラフエンコーダは、グラフ畳み込みニューラルネットワークを基に実現され、疾患―症状共起行列を用いて得られた疾患、症状、患者のノード初期埋込表現、疾患―症状隣接行列及び患者―症状隣接行列を入力とし、異なるタイプのノードは、接続辺を介して情報を伝送し、ノード埋込表現更新操作によって疾患、症状、患者ノード埋込表現を取得し、グラフデコーダに入力し、
前記グラフデコーダは、ノード埋込表現を用いてマルチタスク学習を行い、前記マルチタスク学習は、患者疾患診断予測のマルチラベルな階層分類という部分a)と、疾患対比学習という部分b)と、疾患―症状関係学習という部分c)とを含み、
前記部分a)では、疾患の階層構造を用いて疾患層階層関係を構築し、前記疾患層階層関係は、診断予測を行う必要のある疾患層と、医学知識から得られた疾患システム分類層とを含み、疾患層の疾患種類を
Figure 0007459386000300
と記し、疾患システム分類層を
Figure 0007459386000301
と記し、
Figure 0007459386000302
は、疾患システム分類数であり、
Figure 0007459386000303
個の二値分類器を含むマルチラベルな階層分類器を構築し、
Figure 0007459386000304
個の二値分類器を
Figure 0007459386000305
(ただし、
Figure 0007459386000306
)と記し、
Figure 0007459386000307
を満たし、
患者
Figure 0007459386000308
のノード埋込表現を
Figure 0007459386000309
個の二値分類器にそれぞれ入力して
Figure 0007459386000310
個の予測確率を取得し、
Figure 0007459386000311
と記し、二値分類器
Figure 0007459386000312
に対応するラベルは、患者の疾患システム分類であり、二値分類器
Figure 0007459386000313
に対応するラベルは、患者の疾患診断であり、対応するモデルパラメータは、
Figure 0007459386000314
であり、
患者
Figure 0007459386000315
が疾患
Figure 0007459386000316
を発症する確率
Figure 0007459386000317

Figure 0007459386000318
により求め、
Figure 0007459386000319
は、二値分類器
Figure 0007459386000320
で予測される、患者が
Figure 0007459386000321
を発症するか否かの確率であり、疾患
Figure 0007459386000322
のシステム分類を
Figure 0007459386000323
とし、
Figure 0007459386000324
は、二値分類器
Figure 0007459386000325
で予測される、患者に疾患システム分類
Figure 0007459386000326
が出現するか否かの確率であり、
マルチラベルな階層分類の損失関数
Figure 0007459386000327
は、
Figure 0007459386000328
Figure 0007459386000329
Figure 0007459386000330
Figure 0007459386000331
にて求められ、
Figure 0007459386000332
は、患者数を表し、
Figure 0007459386000333
は、患者
Figure 0007459386000334
が疾患
Figure 0007459386000335
を発症する実ラベルであり、
Figure 0007459386000336
は、患者
Figure 0007459386000337
の疾患診断に対応する疾患システム分類の実ラベルであり、
Figure 0007459386000338
は、L1ノルムを表し、
Figure 0007459386000339
は、疾患
Figure 0007459386000340
と疾患
Figure 0007459386000341
との間の類似度であり、
Figure 0007459386000342
にて求められ、
Figure 0007459386000343
は、疾患
Figure 0007459386000344
及び疾患
Figure 0007459386000345
の実ラベル分布をそれぞれ表し、
Figure 0007459386000346
Figure 0007459386000347
を満たし、
Figure 0007459386000348

Figure 0007459386000349
は、患者
Figure 0007459386000350
が疾患
Figure 0007459386000351
、疾患
Figure 0007459386000352
をそれぞれ発症する実ラベルをそれぞれ表し、
前記部分b)では、疾患ペアシステム種別判別器を構築し、疾患ペア中の2種の疾患の間の距離を算出し、疾患対比学習の損失関数を設計し、
部分c)では、疾患―症状関係学習器を構築し、疾患―症状ペア中の疾患と症状とが関連関係を有する確率を算出し、疾患―症状関係学習の損失関数を設計し、
前記マルチラベルな階層分類の損失関数と前記疾患対比学習の損失関数と前記疾患―症状関係学習の損失関数との和を求めて疾患診断モデルの損失関数を取得することを特徴とするグラフニューラルネットワークに基づく疾患診断予測システム。
A disease diagnosis prediction system based on a graph neural network,
a knowledge map construction module that constructs a disease-symptom knowledge map based on medical knowledge sources;
a data extraction and preprocessing module that extracts patient electronic medical record data that includes patient disease diagnosis and symptom data and is stored in trigram format from the electronic medical record system;
a disease diagnosis model construction module that performs graph neural network learning and predictive modeling on disease-symptom knowledge maps and electronic medical record data;
a disease diagnosis model application module that uses the disease diagnosis model to predict disease diagnosis for input new patient symptoms;
The graph neural network learning and predictive modeling includes constructing a heterogeneous graph network and constructing a disease diagnosis model,
The heterogeneous graph network includes a disease-symptom subgraph constructed by extracting disease-symptom relationships from a disease-symptom knowledge map, and a patient-symptom subgraph constructed using patient disease diagnosis and symptom data in trigram format. including;
The disease diagnosis model is composed of both a graph encoder and a graph decoder,
The graph encoder is realized based on a graph convolutional neural network, and initial embedded representations of nodes of diseases, symptoms, and patients obtained using a disease-symptom co-occurrence matrix, a disease-symptom adjacency matrix, and a patient-symptom adjacency matrix. As an input, nodes of different types transmit information through connection edges, obtain disease, symptom, and patient node embedding representations through node embedding representation update operations, and input them to the graph decoder,
The graph decoder performs multitask learning using a node embedding representation, and the multitask learning includes part a) of multi-label hierarchical classification of patient disease diagnosis prediction, part b) of disease comparison learning, and disease comparison learning. -Includes part c) of symptom-related learning,
In the above part a), a disease layer hierarchical relationship is constructed using a disease hierarchical structure, and the disease layer hierarchical relationship includes a disease layer that requires diagnosis prediction and a disease system classification layer obtained from medical knowledge. including the disease type of the disease layer.
Figure 0007459386000300
, and the disease system classification layer is
Figure 0007459386000301
written as,
Figure 0007459386000302
is the number of disease system classifications,
Figure 0007459386000303
Build a multi-label hierarchical classifier containing binary classifiers,
Figure 0007459386000304
binary classifiers
Figure 0007459386000305
(however,
Figure 0007459386000306
),
Figure 0007459386000307
The filling,
patient
Figure 0007459386000308
The node embedding representation of
Figure 0007459386000309
input into two binary classifiers respectively.
Figure 0007459386000310
Get the predicted probabilities of
Figure 0007459386000311
, a binary classifier
Figure 0007459386000312
The label corresponding to is the patient's disease system classification, and the binary classifier
Figure 0007459386000313
The label corresponding to is the patient's disease diagnosis, and the corresponding model parameter is
Figure 0007459386000314
and
patient
Figure 0007459386000315
is a disease
Figure 0007459386000316
probability of developing
Figure 0007459386000317
of
Figure 0007459386000318
Obtained by,
Figure 0007459386000319
is a binary classifier
Figure 0007459386000320
predicted that the patient will
Figure 0007459386000321
It is the probability of developing or not developing a disease.
Figure 0007459386000322
system classification of
Figure 0007459386000323
year,
Figure 0007459386000324
is a binary classifier
Figure 0007459386000325
Disease system classification of patients as predicted by
Figure 0007459386000326
is the probability of whether or not appears,
Loss function for multi-label hierarchical classification
Figure 0007459386000327
teeth,
Figure 0007459386000328
Figure 0007459386000329
Figure 0007459386000330
Figure 0007459386000331
asked for,
Figure 0007459386000332
represents the number of patients,
Figure 0007459386000333
is a patient
Figure 0007459386000334
is a disease
Figure 0007459386000335
It is a real label that develops,
Figure 0007459386000336
is a patient
Figure 0007459386000337
is the actual label of the disease system classification corresponding to the disease diagnosis of
Figure 0007459386000338
represents the L1 norm,
Figure 0007459386000339
is a disease
Figure 0007459386000340
and diseases
Figure 0007459386000341
is the degree of similarity between
Figure 0007459386000342
asked for,
Figure 0007459386000343
is a disease
Figure 0007459386000344
and diseases
Figure 0007459386000345
respectively represent the real label distribution of
Figure 0007459386000346
Figure 0007459386000347
The filling,
Figure 0007459386000348
and
Figure 0007459386000349
is a patient
Figure 0007459386000350
is a disease
Figure 0007459386000351
,disease
Figure 0007459386000352
Each represents a real label that develops,
In part b), construct a disease pair system type discriminator, calculate the distance between two diseases in the disease pair, design a loss function for disease contrast learning,
In part c), construct a disease-symptom relationship learning device, calculate the probability that a disease and a symptom in a disease-symptom pair have a related relationship, and design a loss function for disease-symptom relationship learning;
A graph neural network characterized in that a loss function of a disease diagnosis model is obtained by calculating the sum of the loss function of the multi-label hierarchical classification, the loss function of the disease comparison learning, and the loss function of the disease-symptom relationship learning. A disease diagnosis prediction system based on
前記知識マップ構築モジュールにおいて、前記疾患―症状知識マップは、疾患と症状との2種のノードタイプ、及び、疾患―症状という1種の関係を含むことを特徴とする請求項1に記載のグラフニューラルネットワークに基づく疾患診断予測システム。 The graph according to claim 1, wherein in the knowledge map construction module, the disease-symptom knowledge map includes two types of nodes: disease and symptom, and one type of relationship: disease-symptom. Disease diagnosis and prediction system based on neural networks. 前記異種グラフネットワークは、疾患―症状知識マップ及び電子カルテデータを基に構築され、疾患と症状と患者との3種のノードタイプを含み、症状は、疾患と患者との間に接続される中間ノードであり、前記異種グラフネットワークには、疾患―症状知識マップのうち疾患、症状に関連する関係サブグラフと、電子カルテデータのうち患者、症状に関連する関係サブグラフとが統合されていることを特徴とする請求項1に記載のグラフニューラルネットワークに基づく疾患診断予測システム。 The heterogeneous graph network is constructed based on a disease-symptom knowledge map and electronic medical record data, and includes three types of nodes: disease, symptom, and patient. Symptom is an intermediate node connected between disease and patient. node, and the heterogeneous graph network is characterized in that relational subgraphs related to diseases and symptoms in the disease-symptom knowledge map and relational subgraphs related to patients and symptoms in electronic medical record data are integrated. A disease diagnosis and prediction system based on a graph neural network according to claim 1. 前記異種グラフネットワーク
Figure 0007459386000353
は、
Figure 0007459386000354
と示され、ノードセットは、
Figure 0007459386000355
と示され、D、S、Pは、それぞれ所定の疾患セット、症状セット及び患者セットであり、且つ
Figure 0007459386000356
と示され、
Figure 0007459386000357
は、疾患種類、症状種類及び患者数をそれぞれ表し、辺セットは、
Figure 0007459386000358
と示され、セットRは、疾患―症状関係を表す
Figure 0007459386000359
と患者―症状関係を表す
Figure 0007459386000360
とを含み、前記疾患―症状関係は、疾患―症状隣接行列に格納され、前記患者―症状関係は、患者―症状隣接行列に格納されていることを特徴とする請求項1に記載のグラフニューラルネットワークに基づく疾患診断予測システム。
The heterogeneous graph network
Figure 0007459386000353
teeth,
Figure 0007459386000354
and the node set is
Figure 0007459386000355
where D, S, and P are respectively a predetermined disease set, symptom set, and patient set, and
Figure 0007459386000356
It is shown that
Figure 0007459386000357
represent the disease type, symptom type, and number of patients, respectively, and the edge set is
Figure 0007459386000358
where the set R represents the disease-symptom relationship.
Figure 0007459386000359
represents the patient-symptom relationship.
Figure 0007459386000360
The graph neural network according to claim 1, wherein the disease-symptom relationship is stored in a disease-symptom adjacency matrix, and the patient-symptom relationship is stored in a patient-symptom adjacency matrix. Network-based disease diagnosis prediction system.
前記ノード初期埋込表現の生成は、
疾患―症状共起行列
Figure 0007459386000361
を構築する処理であって、行列
Figure 0007459386000362
の第
Figure 0007459386000363
行且つ第
Figure 0007459386000364
列が
Figure 0007459386000365
と記され、電子カルテデータおける疾患
Figure 0007459386000366
と診断された患者のうち症状
Figure 0007459386000367
を発症した患者の数を表す処理と、
Figure 0007459386000368
に対して行の正規化を行って
Figure 0007459386000369
を取得する処理であって、疾患
Figure 0007459386000370
の初期埋込表現が
Figure 0007459386000371
であり、
Figure 0007459386000372
の第
Figure 0007459386000373
行を示す処理と、
Figure 0007459386000374
に対して列の正規化を行って
Figure 0007459386000375
を取得する処理であって、症状
Figure 0007459386000376
の初期埋込表現が
Figure 0007459386000377
であり、
Figure 0007459386000378
の第
Figure 0007459386000379
列を示す処理と、
患者
Figure 0007459386000380
の初期埋込表現
Figure 0007459386000381

Figure 0007459386000382
により求める処理とを含み、
Figure 0007459386000383
は、患者
Figure 0007459386000384
の症状数であることを特徴とする請求項4に記載のグラフニューラルネットワークに基づく疾患診断予測システム。
The generation of the node initial embedded representation is
Disease-symptom co-occurrence matrix
Figure 0007459386000361
The process of constructing a matrix
Figure 0007459386000362
No.
Figure 0007459386000363
row and first
Figure 0007459386000364
The row is
Figure 0007459386000365
disease in electronic medical record data.
Figure 0007459386000366
Symptoms among patients diagnosed with
Figure 0007459386000367
a process representing the number of patients who developed
Figure 0007459386000368
Perform row normalization for
Figure 0007459386000369
The process of acquiring disease
Figure 0007459386000370
The initial embedded representation of
Figure 0007459386000371
and
Figure 0007459386000372
No.
Figure 0007459386000373
Processing to indicate the row,
Figure 0007459386000374
Perform column normalization for
Figure 0007459386000375
The process of acquiring the symptoms
Figure 0007459386000376
The initial embedded representation of
Figure 0007459386000377
and
Figure 0007459386000378
No.
Figure 0007459386000379
Processing to indicate columns,
patient
Figure 0007459386000380
initial embedded representation of
Figure 0007459386000381
of
Figure 0007459386000382
including the processing required by
Figure 0007459386000383
is a patient
Figure 0007459386000384
5. The disease diagnosis and prediction system based on a graph neural network according to claim 4, wherein the number of symptoms is .
異なるタイプのノード初期埋込表現を1つの多層パーセプトロンにそれぞれ入力し、同じ次元の初期埋込表現を取得してから、グラフエンコーダに入力することを特徴とする請求項1に記載のグラフニューラルネットワークに基づく疾患診断予測システム。 The graph neural network according to claim 1, characterized in that initial embedding representations of nodes of different types are respectively input into one multilayer perceptron, and initial embedding representations of the same dimension are obtained and then input into a graph encoder. A disease diagnosis prediction system based on 前記グラフエンコーダでは、疾患
Figure 0007459386000385
について、第
Figure 0007459386000386
層のノード埋込表現
Figure 0007459386000387
は、
Figure 0007459386000388
にて求められ、
症状
Figure 0007459386000389
について、第
Figure 0007459386000390
層のノード埋込表現
Figure 0007459386000391
は、
Figure 0007459386000392
にて求められ、
患者
Figure 0007459386000393
について、第
Figure 0007459386000394
層のノード埋込表現
Figure 0007459386000395
は、
Figure 0007459386000396
にて求められ、
Figure 0007459386000397
は、活性化関数であり、
Figure 0007459386000398
は、それぞれ第
Figure 0007459386000399
層疾患診断モデルをトレーニングして得られた疾患―症状関連重み行列及び患者―症状関連重み行列であり、
Figure 0007459386000400
は、それぞれ疾患
Figure 0007459386000401
、症状
Figure 0007459386000402
、患者
Figure 0007459386000403
の、第
Figure 0007459386000404
層におけるノード埋込表現であり、
Figure 0007459386000405
は、疾患
Figure 0007459386000406
に隣接する症状ノードのセットを表し、
Figure 0007459386000407
は、症状
Figure 0007459386000408
に隣接する疾患ノードのセットを表し、
Figure 0007459386000409
は、症状
Figure 0007459386000410
に隣接する患者ノードのセットを表し、
Figure 0007459386000411
は、患者
Figure 0007459386000412
に隣接する症状ノードのセットを表すことを特徴とする請求項5に記載のグラフニューラルネットワークに基づく疾患診断予測システム。
In the graph encoder, the disease
Figure 0007459386000385
About, No.
Figure 0007459386000386
Layer node embedding representation
Figure 0007459386000387
teeth,
Figure 0007459386000388
asked for,
symptoms
Figure 0007459386000389
About, No.
Figure 0007459386000390
Layer node embedding representation
Figure 0007459386000391
teeth,
Figure 0007459386000392
asked for,
patient
Figure 0007459386000393
About, No.
Figure 0007459386000394
Layer node embedding representation
Figure 0007459386000395
teeth,
Figure 0007459386000396
asked for,
Figure 0007459386000397
is the activation function,
Figure 0007459386000398
are each
Figure 0007459386000399
A disease-symptom related weight matrix and a patient-symptom related weight matrix obtained by training a layered disease diagnosis model,
Figure 0007459386000400
are each disease
Figure 0007459386000401
, symptoms
Figure 0007459386000402
,patient
Figure 0007459386000403
of, the th
Figure 0007459386000404
is a node embedding representation in a layer,
Figure 0007459386000405
is a disease
Figure 0007459386000406
represents the set of symptom nodes adjacent to
Figure 0007459386000407
are the symptoms
Figure 0007459386000408
represents the set of disease nodes adjacent to
Figure 0007459386000409
are the symptoms
Figure 0007459386000410
represents the set of patient nodes adjacent to
Figure 0007459386000411
is a patient
Figure 0007459386000412
6. The disease diagnosis and prediction system based on graph neural networks according to claim 5, characterized in that the system represents a set of symptom nodes adjacent to .
前記グラフデコーダでは、前記疾患対比学習において、
疾患セットD中の疾患を2つずつ組み合わせ、疾患ペアセットDDを取得し、疾患ペア数が
Figure 0007459386000413
であり、DD中の何れか1つの疾患ペア
Figure 0007459386000414
に関し、疾患ペアラベルは、2種の疾患が同一のシステム分類に属する場合に、
Figure 0007459386000415
とし、2種の疾患が異なるシステム分類に属する場合に、
Figure 0007459386000416
とし、
疾患ペアシステム種別判別器
Figure 0007459386000417
を構築し、疾患ペア
Figure 0007459386000418
中の2種の疾患のノード埋込表現
Figure 0007459386000419

Figure 0007459386000420
に入力し、2種の疾患の間の距離
Figure 0007459386000421

Figure 0007459386000422
により求め、
Figure 0007459386000423
は、L2ノルムを表し、
疾患対比学習の損失関数
Figure 0007459386000424

Figure 0007459386000425
により求め、mは、異なる疾患システム種別埋込表現の間の距離の下限値であることを特徴とする請求項7に記載のグラフニューラルネットワークに基づく疾患診断予測システム。
In the graph decoder, in the disease comparison learning,
Two diseases in disease set D are combined to obtain disease pair set DD, and the number of disease pairs is
Figure 0007459386000413
and any one disease pair in DD
Figure 0007459386000414
Regarding disease pair labels, when two diseases belong to the same system classification,
Figure 0007459386000415
If two diseases belong to different system classifications,
Figure 0007459386000416
year,
Disease pair system type discriminator
Figure 0007459386000417
construct a disease pair
Figure 0007459386000418
Node embedding representation of two diseases in
Figure 0007459386000419
of
Figure 0007459386000420
and the distance between the two diseases
Figure 0007459386000421
of
Figure 0007459386000422
Obtained by,
Figure 0007459386000423
represents the L2 norm,
Loss function for disease contrast learning
Figure 0007459386000424
of
Figure 0007459386000425
8. The disease diagnosis prediction system based on a graph neural network according to claim 7, wherein m is a lower limit value of the distance between embedded representations of different disease system types.
前記グラフデコーダでは、前記疾患―症状関係学習において、
疾患セットD及び症状セットSから疾患及び症状を1種ずつ選択し、疾患―症状ペアセットDSを取得し、疾患―症状ペア数が
Figure 0007459386000426
であり、DS中の何れか1つの疾患―症状ペア
Figure 0007459386000427
に関し、疾患―症状ペアラベルは、疾患―症状が疾患―症状知識マップにおいて関連関係を有する場合に、
Figure 0007459386000428
とし、疾患―症状が疾患―症状知識マップにおいて関連関係を有さない場合に、
Figure 0007459386000429
とし、
疾患―症状関係学習器
Figure 0007459386000430
を構築し、
Figure 0007459386000431
中の疾患及び症状のノード埋込表現
Figure 0007459386000432

Figure 0007459386000433
に入力し、
Figure 0007459386000434
中の疾患と症状とが関連関係を有する確率
Figure 0007459386000435

Figure 0007459386000436
により求め、
Figure 0007459386000437
は、sigmoid関数を表し、
疾患―症状関係学習の損失関数
Figure 0007459386000438

Figure 0007459386000439
により求めることを特徴とする請求項7に記載のグラフニューラルネットワークに基づく疾患診断予測システム。
In the graph decoder, in the disease-symptom relationship learning,
Select one disease and one symptom from the disease set D and the symptom set S to obtain a disease-symptom pair set DS, and calculate the number of disease-symptom pairs.
Figure 0007459386000426
and any one of the disease-symptom pairs in the DS
Figure 0007459386000427
Regarding the disease-symptom pair label, if the disease-symptom has an associated relationship in the disease-symptom knowledge map,
Figure 0007459386000428
If the disease-symptom does not have an association relationship in the disease-symptom knowledge map,
Figure 0007459386000429
year,
Disease-symptom relationship learning device
Figure 0007459386000430
Build
Figure 0007459386000431
Node embedding representation of diseases and symptoms in
Figure 0007459386000432
of
Figure 0007459386000433
Enter in
Figure 0007459386000434
Probability of association between the disease and symptoms in
Figure 0007459386000435
of
Figure 0007459386000436
Calculate by
Figure 0007459386000437
represents the sigmoid function,
Loss function for learning disease-symptom relationships
Figure 0007459386000438
of
Figure 0007459386000439
The disease diagnosis and prediction system based on a graph neural network according to claim 7, characterized in that the disease diagnosis and prediction system is obtained by:
JP2023536567A 2021-12-27 2022-09-05 Disease diagnosis prediction system based on graph neural network Active JP7459386B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN202111609275.1A CN113990495B (en) 2021-12-27 2021-12-27 Disease diagnosis prediction system based on graph neural network
CN202111609275.1 2021-12-27
PCT/CN2022/116970 WO2023124190A1 (en) 2021-12-27 2022-09-05 Graph neural network-based disease diagnosis and prediction system

Publications (2)

Publication Number Publication Date
JP2024503980A JP2024503980A (en) 2024-01-30
JP7459386B2 true JP7459386B2 (en) 2024-04-01

Family

ID=79734519

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2023536567A Active JP7459386B2 (en) 2021-12-27 2022-09-05 Disease diagnosis prediction system based on graph neural network

Country Status (3)

Country Link
JP (1) JP7459386B2 (en)
CN (1) CN113990495B (en)
WO (1) WO2023124190A1 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113990495B (en) * 2021-12-27 2022-04-29 之江实验室 Disease diagnosis prediction system based on graph neural network
CN114496283A (en) * 2022-02-15 2022-05-13 山东大学 Disease prediction system based on path reasoning, storage medium and equipment
CN114496234B (en) * 2022-04-18 2022-07-19 浙江大学 Cognitive-atlas-based personalized diagnosis and treatment scheme recommendation system for general patients
CN114898879B (en) * 2022-05-10 2023-04-21 电子科技大学 Chronic disease risk prediction method based on graph representation learning
CN114664452B (en) * 2022-05-20 2022-09-23 之江实验室 General multi-disease prediction system based on causal verification data generation
CN115019923B (en) * 2022-07-11 2023-04-28 中南大学 Electronic medical record data pre-training method based on contrast learning
CN115359870B (en) * 2022-10-20 2023-03-24 之江实验室 Disease diagnosis and treatment process abnormity identification system based on hierarchical graph neural network
CN115424724B (en) * 2022-11-04 2023-01-24 之江实验室 Lung cancer lymph node metastasis auxiliary diagnosis system for multi-modal forest
CN115862848B (en) * 2023-02-15 2023-05-30 之江实验室 Disease prediction system and device based on clinical data screening and medical knowledge graph
CN116072298B (en) * 2023-04-06 2023-08-15 之江实验室 Disease prediction system based on hierarchical marker distribution learning
CN116646072A (en) * 2023-05-18 2023-08-25 肇庆医学高等专科学校 Training method and device for prostate diagnosis neural network model
CN116562266B (en) * 2023-07-10 2023-09-15 中国医学科学院北京协和医院 Text analysis method, computer device, and computer-readable storage medium
CN116631641B (en) * 2023-07-21 2023-12-22 之江实验室 Disease prediction device integrating self-adaptive similar patient diagrams
CN116936108B (en) * 2023-09-19 2024-01-02 之江实验室 Unbalanced data-oriented disease prediction system
CN117010494B (en) * 2023-09-27 2024-01-05 之江实验室 Medical data generation method and system based on causal expression learning
CN117012374B (en) * 2023-10-07 2024-01-26 之江实验室 Medical follow-up system and method integrating event map and deep reinforcement learning
CN117235487B (en) * 2023-10-12 2024-03-12 北京大学第三医院(北京大学第三临床医学院) Feature extraction method and system for predicting hospitalization event of asthma patient
CN117409911B (en) * 2023-10-13 2024-05-07 四川大学 Electronic medical record representation learning method based on multi-view contrast learning
CN117438023B (en) * 2023-10-31 2024-04-26 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117894422A (en) * 2024-03-18 2024-04-16 攀枝花学院 ICU severe monitoring-based data visualization method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666477A (en) 2020-06-19 2020-09-15 腾讯科技(深圳)有限公司 Data processing method and device, intelligent equipment and medium
CN111914562A (en) 2020-08-21 2020-11-10 腾讯科技(深圳)有限公司 Electronic information analysis method, device, equipment and readable storage medium
CN113656589A (en) 2021-04-19 2021-11-16 腾讯科技(深圳)有限公司 Object attribute determination method and device, computer equipment and storage medium
CN113674856A (en) 2021-04-15 2021-11-19 腾讯科技(深圳)有限公司 Medical data processing method, device, equipment and medium based on artificial intelligence

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7774143B2 (en) * 2002-04-25 2010-08-10 The United States Of America As Represented By The Secretary, Department Of Health And Human Services Methods for analyzing high dimensional data for classifying, diagnosing, prognosticating, and/or predicting diseases and other biological states
US20130268290A1 (en) * 2012-04-02 2013-10-10 David Jackson Systems and methods for disease knowledge modeling
PL407244A1 (en) * 2014-02-18 2015-08-31 Instytut Biochemii I Biofizyki Polskiej Akademii Nauk Electrochemical bio-sensor for detecting S100B protein
US20150356272A1 (en) * 2014-06-10 2015-12-10 Taipei Medical University Prescription analysis system and method for applying probabilistic model based on medical big data
US20190155993A1 (en) * 2017-11-20 2019-05-23 ThinkGenetic Inc. Method and System Supporting Disease Diagnosis
CN108154928A (en) * 2017-12-27 2018-06-12 北京嘉和美康信息技术有限公司 A kind of methods for the diagnosis of diseases and device
CN108198620B (en) * 2018-01-12 2022-03-22 洛阳飞来石软件开发有限公司 Skin disease intelligent auxiliary diagnosis system based on deep learning
CN109036553B (en) * 2018-08-01 2022-03-29 北京理工大学 Disease prediction method based on automatic extraction of medical expert knowledge
US11636949B2 (en) * 2018-08-10 2023-04-25 Kahun Medical Ltd. Hybrid knowledge graph for healthcare applications
CN109784387A (en) * 2018-12-29 2019-05-21 天津南大通用数据技术股份有限公司 Multi-level progressive classification method and system based on neural network and Bayesian model
CN110277165B (en) * 2019-06-27 2021-06-04 清华大学 Auxiliary diagnosis method, device, equipment and storage medium based on graph neural network
CN111370127B (en) * 2020-01-14 2022-06-10 之江实验室 Decision support system for early diagnosis of chronic nephropathy in cross-department based on knowledge graph
CN111382272B (en) * 2020-03-09 2022-11-01 西南交通大学 Electronic medical record ICD automatic coding method based on knowledge graph
CN111834012A (en) * 2020-07-14 2020-10-27 中国中医科学院中医药信息研究所 Traditional Chinese medicine syndrome diagnosis method and device based on deep learning and attention mechanism
CN112037912B (en) * 2020-09-09 2023-07-11 平安科技(深圳)有限公司 Triage model training method, device and equipment based on medical knowledge graph
CN112263220A (en) * 2020-10-23 2021-01-26 北京文通图像识别技术研究中心有限公司 Endocrine disease intelligent diagnosis system
CN113409892B (en) * 2021-05-13 2023-04-25 西安电子科技大学 MiRNA-disease association relation prediction method based on graph neural network
CN113434626B (en) * 2021-08-27 2021-12-07 之江实验室 Multi-center medical diagnosis knowledge map representation learning method and system
CN113643821B (en) * 2021-10-13 2022-02-11 浙江大学 Multi-center knowledge graph joint decision support method and system
CN113990495B (en) * 2021-12-27 2022-04-29 之江实验室 Disease diagnosis prediction system based on graph neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666477A (en) 2020-06-19 2020-09-15 腾讯科技(深圳)有限公司 Data processing method and device, intelligent equipment and medium
CN111914562A (en) 2020-08-21 2020-11-10 腾讯科技(深圳)有限公司 Electronic information analysis method, device, equipment and readable storage medium
CN113674856A (en) 2021-04-15 2021-11-19 腾讯科技(深圳)有限公司 Medical data processing method, device, equipment and medium based on artificial intelligence
CN113656589A (en) 2021-04-19 2021-11-16 腾讯科技(深圳)有限公司 Object attribute determination method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
JP2024503980A (en) 2024-01-30
CN113990495B (en) 2022-04-29
WO2023124190A1 (en) 2023-07-06
CN113990495A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
JP7459386B2 (en) Disease diagnosis prediction system based on graph neural network
CN111382272B (en) Electronic medical record ICD automatic coding method based on knowledge graph
Li et al. A survey of data-driven and knowledge-aware explainable ai
CN113241135B (en) Disease risk prediction method and system based on multi-modal fusion
CN107516110B (en) Medical question-answer semantic clustering method based on integrated convolutional coding
WO2023202508A1 (en) Cognitive graph-based general practice patient personalized diagnosis and treatment scheme recommendation system
KR102153920B1 (en) System and method for interpreting medical images through the generation of refined artificial intelligence reinforcement learning data
Dooshima et al. A predictive model for the risk of mental illness in Nigeria using data mining
CN112100406B (en) Data processing method, device, equipment and medium
Castellani et al. Place and health as complex systems: A case study and empirical test
CN111967495A (en) Classification recognition model construction method
US20210406687A1 (en) Method for predicting attribute of target object based on machine learning and related device
CN113707339B (en) Method and system for concept alignment and content inter-translation among multi-source heterogeneous databases
CN113673244B (en) Medical text processing method, medical text processing device, computer equipment and storage medium
Liu et al. Visualizing graph neural networks with corgie: Corresponding a graph to its embedding
CN114783603A (en) Multi-source graph neural network fusion-based disease risk prediction method and system
CN114580388A (en) Data processing method, object prediction method, related device and storage medium
CN115687642A (en) Traditional Chinese medicine diagnosis and treatment knowledge discovery method based on clinical knowledge graph representation learning
CN112069825B (en) Entity relation joint extraction method for alert condition record data
JP7365747B1 (en) Disease treatment process abnormality identification system based on hierarchical neural network
CN116913459A (en) Medicine recommendation method and system based on deep convolution network control gate model
CN115660871B (en) Unsupervised modeling method for medical clinical process, computer equipment and storage medium
CN116701590A (en) Visual question-answering method for constructing answer semantic space based on knowledge graph
CN116168828A (en) Disease prediction method and device based on knowledge graph and deep learning and computer equipment
CN114429822A (en) Medical record quality inspection method and device and storage medium

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20230615

A871 Explanation of circumstances concerning accelerated examination

Free format text: JAPANESE INTERMEDIATE CODE: A871

Effective date: 20230705

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20240219

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20240319

R150 Certificate of patent or registration of utility model

Ref document number: 7459386

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150