CN116797817A - Autism disease prediction technology based on self-supervision graph convolution model - Google Patents

Autism disease prediction technology based on self-supervision graph convolution model Download PDF

Info

Publication number
CN116797817A
CN116797817A CN202310405718.8A CN202310405718A CN116797817A CN 116797817 A CN116797817 A CN 116797817A CN 202310405718 A CN202310405718 A CN 202310405718A CN 116797817 A CN116797817 A CN 116797817A
Authority
CN
China
Prior art keywords
graph
node
matrix
image data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310405718.8A
Other languages
Chinese (zh)
Inventor
李文欣
韩龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN202310405718.8A priority Critical patent/CN116797817A/en
Publication of CN116797817A publication Critical patent/CN116797817A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)

Abstract

The application provides an autism disease prediction technology based on a self-supervision graph convolution model, belonging to the field of artificial intelligence biomedicine; the technology aims to solve the problems of accuracy and efficiency of autism disease prediction; the technical scheme of the application mainly comprises the following steps: a self-supervision learning method based on a graph convolutional network is adopted, and a graph convolutional neural network is trained by utilizing a large-scale unlabeled data set; by utilizing the graph contrast learning method, views of two graphs are generated through random enhancement of structures and attributes of the graphs, disturbance of model learning on unimportant nodes and edges is forced, so that the model is forced to recognize semantic information of a bottom layer, and prediction accuracy is further improved. The technique is mainly used for assisting doctors in diagnosing and treating autism, improving the prediction accuracy and efficiency and providing better medical services for patients.

Description

Autism disease prediction technology based on self-supervision graph convolution model
Technical Field
The application belongs to the field of disease prediction by deep learning, and particularly relates to graph comparison learning in self-supervision learning and composition by Euclidean distance algorithm, and prediction by spectrogram convolution. Spectral convolution exploits the property of convolution in the fourier domain, which can be extended to generic graphs by finding the corresponding "fourier" basis. For low-dimensional graphs, convolutional layer parameters independent of input size can be learned, resulting in an efficient deep structure.
Background
The image convolution neural network has been successful in medical image, and in the field of image classification, the deep neural network plays an important role in medical image analysis, and for research of single-mode image data, image classification is used for predicting whether a disease occurs according to an input image and generating an output classification. The combination of image data and non-image data for diagnosis of diseases is an important task in medical clinic, and multi-modal biomedical technology promotes the study of auxiliary diagnosis in medicine by a computer, and compared with single-modal data, multi-modal heterogeneous data can provide more information about the disease condition of a patient, so that diagnosis is more reliable.
Disclosure of Invention
In order to solve the technical problems, the application aims to provide a graph convolution network based on self-supervision graph comparison learning to realize autism disease prediction, and the comparison learning can force the model to learn disturbance on unimportant nodes and edges so as to force the model to identify bottom semantic information, thereby better helping to learn graph representation and node representation, realizing more accurate prediction according to an input view and effectively carrying out auxiliary diagnosis. The method comprises the following steps:
data preprocessing: each subject includes image data, a label, non-image data. The image data is a functional magnetic resonance image (fMRI) of the subject, the patient with the autism spectrum disorder and the normal person can be marked as 1 and 2 respectively by a subject tag (dx_group), and the non-image data is personal statistical information of age, sex and the like of the patient. The matrix representation of the image data is obtained by loading a pre-computed fMRI functional magnetic resonance imaging network, the label and other non-image data of the subject are obtained from the phenotype file, and finally, for 871 subjects, each is used as a verification set by adopting a 10-time cross verification method, and the other is used as a training set for training and verification, so that the generalization capability is reduced.
Extracting a matrix representation of the non-image data: 871 subjects selected from the dataset selected corresponding non-image data based on subject number information (subject_ids), and taking into account the possible variability between data, normalization was used to map the embedded representations of the different vector spaces to the same space. The adopted Z-score normalization method processes the characteristic data from the line angle, called standard deviation normalization, and gives the mean (mean) and standard deviation (standard deviation) of the original data, so that the characteristic values of the samples can be converted into the same dimension. The way in which data is normalized can be expressed as:
where μ is the mean value of all sample data of the raw data, σ is the standard deviation, x * ,y * Is the data in the sample. After processing, data in the same space is obtained, and Euclidean (Euclidean) metric algorithm is used to calculate pair association:
where x is * i ,y * i Feature vectors representing two subjects, respectively, m-dimensional features were selected.
Extracting a matrix representation of image data in the multimodal data: the image data of each subject is formed by fMRI functional magnetic resonance imaging, and a pre-calculated functional magnetic resonance imaging connection network is loaded first, so that the image data of each subject is represented by a multi-dimensional array of 111 x 111, one of the upper triangular matrixes is taken, and an array of (871, 6105) dimensions can be obtained to represent the image data of each subject. And (3) reducing the dimension of the feature vector by using a classifier, randomly defining an initial value of C, extracting a C-dimensional feature vector representing the functional connectivity of the subjects from the obtained fMRI by using a selector RFE, calculating the feature distance between every two subjects, and finally obtaining the similarity representation of the subjects.
Enhancement map contrast learning: the similarity matrix obtained by the image data is combined with the similarity matrix obtained by the non-image data, the obtained matrix is regarded as a full-connection graph, and the graph data is subjected to two times of data enhancement, namely topology level enhancement and node attribute level enhancement.
For topology level enhancement, the probability of deleting edges is set according to the centrality of the nodes, the importance of edges is generally characterized by the nodes to which they point, and the probability of non-important edges being deleted is higher. A superparameter is set to control the probability of removing the number of edges, which would destroy the structure of the graph.
Noise is reduced by randomly masking a portion of the dimensions in node features for enhancement of node attribute levelsRandomly adding the node characteristics into the node attributes, and finally generating the node characteristics, wherein the node characteristics are expressed as follows:
[·;·]is a join operator that is used to join the two objects,refers to each element multiplication. Two views are generated by jointly performing topology and node attribute enhancements.
Disease prediction using spectral convolution: node characteristics and two graph views obtained by setting different loss probabilities are input into a graph convolution network. A spectrogram convolution module: the spectrum GCN is optimized with the two generated image views respectively to obtain two prediction results, and the GCN model is formed by a Chebyshev image convolution layer L G The method comprises three parts, namely a fusion block and an MLP predictor. To alleviate the excessive smoothing problem in GCN, node-jump connections are employed to fuse hidden features in each depth.
Calculating loss: and calculating losses by respectively adding the labels and the two obtained prediction results, and carrying out back propagation on a loss function, wherein an Adam optimizer is used for adjusting model parameters along the gradient descent direction according to the gradient of the parameters. The prediction results were quantified using commonly used evaluation indicators.
In this application, based on the foregoing solution, the determining unit is configured as a convolutional neural network based on graph contrast learning, and includes the following modules:
and the data extraction and topology construction module is as follows: extracting functional magnetic resonance image data of 871 selected subjects in the ABIDE data set and data of required dimension in phenotype data, normalizing the required phenotype data, measuring Euclidean distance, and combining with image information to obtain fully connected graph topology;
graph contrast network module: applying contrast constraints between graph and node features, maximizing mutual information, preserving the inherent structure and properties of the graph using data enhancement schemes will force the model to learn representations that are insensitive to disturbances at unimportant nodes and edges.
A spectrogram convolution module: the spectrum GCN is optimized with the two generated graph views respectively to obtain two prediction results, and a GCN model Chebyshev graph convolution layer L G The method comprises three parts, namely a fusion block and an MLP predictor. To alleviate the excessive smoothing problem in GCN, node-jump connections are employed to fuse hidden features in each depth.
Wherein the chebyshev diagram convolution module is capable of implementing local filtering on an irregularly weighted diagram with moderate computational cost, with each convolution layer followed by a ReLU activation function to increase nonlinearity.
The present application provides such a disease prediction technique, using the disclosed ABIDE dataset, to obtain the original features, labels, non-image features of the subject, using 10-fold cross-validation; constructing a relation matrix between subjects by using a Euclidean algorithm, and fusing the relation matrix with an image characteristic similarity matrix to obtain more accurate matrix representation; the enhancement graph contrast learning is used for enhancing graph data twice, so that the robustness of the graph is further improved; finally, the spectrogram convolution is used for disease prediction, and the MLP is used for obtaining the final classification result.
Drawings
FIG. 1 is a schematic diagram of a method for enhancing data by graph contrast learning according to the present application;
FIG. 2 is a flow chart of disease prediction of the present application;
fig. 3 is a diagram of the overall architecture of the present application.
Detailed Description
The application is described in further detail below with reference to the drawings and the detailed description. For step numbers in the following implementations, which are provided for convenience of illustration only, the embodiments described below are only a part of the present application and should not be construed as being limited to all examples of the above-described manner.
S101: data preprocessing: obtaining subject information in an ABIDE autism brain imaging database, wherein the subject information comprises image data and non-image data, the image information in the data set is obtained by using a pre-calculated fMRI network, and finally, each data is selected as a verification set by using 10 times of cross verification, and the rest are used as training sets for training and verification.
At this time, in order to increase the generalization performance of the model as much as possible for data augmentation, another medical published dataset ADNI was considered for the study of alzheimer's disease, the dataset comprising longitudinal brain MRI, PET data, genetic and phenotypic information of 1700 adults. In the process of the application, 1675 MCI mild cognitive impairment patients in the data set are selected, and for the image data in the MCI mild cognitive impairment patients, C=138 segmented brain structure volumes are extracted from MRI by using MALP-EM. The pair-wise association is calculated using the phenotype data and the genetic data.
S102: extracting a matrix representation of the non-image data: and selecting the required characteristics of each dimension by using phenotype data of corresponding subjects, and constructing a relation matrix between the subjects by using a Euclidean algorithm.
S103: extracting a matrix representation of image data in the multimodal data: the image data of each subject consists of fMRI functional magnetic resonance imaging, C-dimensional feature vectors representing the functional connectivity of the subject are extracted from the obtained fMRI, the similarity of each subject is calculated, and finally matrix representation is obtained.
S104: enhancement map contrast learning: performing matrix multiplication on a similarity matrix of the S2 non-image data and a similarity matrix of the S3 image data, wherein the obtained matrix is regarded as a full-connection diagram, views of the two diagrams are generated through random enhancement of the structures and the attributes of the diagrams, two new diagrams are generated after enhancement and regarded as View views, and the views are encoded by using GCN to be trained to obtain node representation vectors and diagram representation vectors; further enhancing the robustness of the graph.
S105: disease prediction using spectral convolution: the node characteristics and the two graph views obtained by setting different loss probabilities are input into a graph convolution network, and the final classification probability is output through a neural network MLP. Loss was calculated and the experimental results were evaluated using the accuracy, sensitivity, F1 score evaluation index.
FIG. 1 is a diagram comparison learning method used in the present application, wherein the diagram data is enhanced twice, namely, topology level enhancement and node attribute level enhancement.
Fig. 2 is a flow chart of the present application, processing is performed sequentially from top to bottom.
Fig. 3 is a general architecture diagram of the present application.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (4)

1. The autism disease prediction technology based on the self-supervision graph convolution model is characterized by comprising the following steps:
s1: data preprocessing: subject information is obtained in an Autism Brain Imaging Database (ABIDE), wherein each subject information includes image data and non-image data.
S2: extracting a matrix representation of the non-image data: the desired characteristics of each dimension are selected using the phenotype data of the corresponding subjects, and a Euclidean (Euclidean) metric algorithm is used to construct a relationship matrix between the subjects.
S3: extracting a matrix representation of image data in the multimodal data: the image data of each subject consists of fMRI functional magnetic resonance imaging, a feature vector representing a certain dimension of functional connectivity of the subjects is extracted from the obtained fMRI, the similarity of each subject is calculated, and finally, a matrix representation representing the similarity of the image data among the subjects is obtained.
S4: enhancement map contrast learning: performing matrix multiplication on the similarity matrix of the non-image data in the S2 and the similarity matrix of the image data in the S3, wherein the obtained matrix is a full-connection graph and is used as a scoring matrix between node pairs; node pairs which are larger than a certain score threshold value are reserved, so that an adjacency matrix (graph topological structure) of a graph is obtained, the graph topological structure is enhanced through a centrality algorithm, node attributes are enhanced through node features and noise is randomly added, a node representation vector and a graph representation vector are obtained, different enhancement probabilities are set, and two new views (views) are generated; further enhancing the robustness of the graph.
S5: disease prediction using spectral convolution: inputting node characteristics and two views obtained by setting different loss probabilities into a graph rolling network (GCN), outputting the final classification probability through a neural network full-connection layer, calculating loss through a true classification label, realizing training of the graph rolling neural network, and reasoning out a prediction result of a model.
2. The autism disease prediction technique S4 based on the self-supervised graph convolution model as set forth in claim 1, wherein: firstly, selecting phenotype data of some dimensions from each subject, and calculating the real distance between two points in an m-dimensional space by using a Euclidean measurement algorithm to obtain matrix representation of non-image data; then, performing matrix multiplication on the image matrix of each subject to obtain a full-connection graph serving as a scoring matrix between node pairs; we leave pairs of nodes above a certain score threshold, resulting in a adjacency matrix (graph topology) of the graph; the graph contrast network firstly uses a centrality algorithm to strengthen the graph structure, the degree of each node is calculated through an adjacency matrix to judge the importance degree of each node in the graph structure, the larger the degree is, the more important the node is in the graph structure, the probability of deleting edges is set according to the centrality of the node, and a new graph topological structure is obtained; secondly, enhancing the node attribute, randomly masking a part of dimensionality in the node feature, randomly adding noise into the node attribute, and finally generating a new node feature matrix; a new view is generated through random enhancement of the graph structure and the attribute, different enhancement probabilities are respectively set for the new view to generate two views for graph comparison learning, and the information learned by the graph comparison module can improve the identification of the feature expression.
3. The autism disease prediction technique S5 step based on the self-supervised graph convolution model according to claim 1, wherein: the used graph convolution neural network GCN can effectively integrate node characteristics of the network and the topological structure of the graph, and an adjacency matrix and a characteristic matrix are used as two most important inputs of the GCN, and the quality of the adjacency matrix and the characteristic matrix influences the classification performance of the model.
4. An autism disease prediction technique based on a self-supervised graph convolution model, comprising:
and a data enhancement module: carrying out data enhancement on the graph topology by using a centrality algorithm; adding noise to the node characteristics to randomly enhance the node attributes, and generating a new view;
graph contrast network module: different random enhancement probabilities (probability of edge deletion and probability of noise addition to multi-dimensional node properties) are set for the new view, contrast constraints are applied between the graph and node features to maximize mutual information, and the inherent structure and properties of the graph are preserved using a data enhancement scheme, which will force the model to learn representations insensitive to disturbances of unimportant nodes and edges.
CN202310405718.8A 2023-04-17 2023-04-17 Autism disease prediction technology based on self-supervision graph convolution model Pending CN116797817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310405718.8A CN116797817A (en) 2023-04-17 2023-04-17 Autism disease prediction technology based on self-supervision graph convolution model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310405718.8A CN116797817A (en) 2023-04-17 2023-04-17 Autism disease prediction technology based on self-supervision graph convolution model

Publications (1)

Publication Number Publication Date
CN116797817A true CN116797817A (en) 2023-09-22

Family

ID=88045872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310405718.8A Pending CN116797817A (en) 2023-04-17 2023-04-17 Autism disease prediction technology based on self-supervision graph convolution model

Country Status (1)

Country Link
CN (1) CN116797817A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409978A (en) * 2023-12-15 2024-01-16 贵州大学 Disease prediction model construction method, system, device and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409978A (en) * 2023-12-15 2024-01-16 贵州大学 Disease prediction model construction method, system, device and readable storage medium
CN117409978B (en) * 2023-12-15 2024-04-19 贵州大学 Disease prediction model construction method, system, device and readable storage medium

Similar Documents

Publication Publication Date Title
Khan et al. Skin lesion segmentation and classification: A unified framework of deep neural network features fusion and selection
Li et al. Lightweight attention convolutional neural network for retinal vessel image segmentation
Chen et al. Deep feature learning for medical image analysis with convolutional autoencoder neural network
Salido et al. Using deep learning to detect melanoma in dermoscopy images
US11200982B2 (en) Method for analysing medical treatment data based on deep learning and intelligence analyser thereof
CN111461232A (en) Nuclear magnetic resonance image classification method based on multi-strategy batch type active learning
CN109447183A (en) Model training method, device, equipment and medium
CN107480702B (en) Feature selection and feature fusion method for HCC pathological image recognition
Adeli et al. Logistic regression confined by cardinality-constrained sample and feature selection
CN112766355B (en) Electroencephalogram signal emotion recognition method under label noise
WO2022127500A1 (en) Multiple neural networks-based mri image segmentation method and apparatus, and device
CN112765370B (en) Entity alignment method and device of knowledge graph, computer equipment and storage medium
CN112766376A (en) Multi-label eye fundus image identification method based on GACNN
CN116884623B (en) Medical rehabilitation prediction system based on laser scanning imaging
CN115841607A (en) Brain network structure and similarity joint learning method based on graph attention network
CN112863664A (en) Alzheimer disease classification method based on multi-modal hypergraph convolutional neural network
Khan et al. SkinNet‐ENDO: Multiclass skin lesion recognition using deep neural network and Entropy‐Normal distribution optimization algorithm with ELM
CN116797817A (en) Autism disease prediction technology based on self-supervision graph convolution model
Thangavel et al. EAD-DNN: Early Alzheimer's disease prediction using deep neural networks
Padole et al. Graph wavelet-based multilevel graph coarsening and its application in graph-CNN for alzheimer’s disease detection
Anusha et al. Parkinson’s disease identification in homo sapiens based on hybrid ResNet-SVM and resnet-fuzzy svm models
CN116884067B (en) Micro-expression recognition method based on improved implicit semantic data enhancement
CN117591953A (en) Cancer classification method and system based on multiple groups of study data and electronic equipment
Wang et al. Deep sparse autoencoder integrated with three‐stage framework for glaucoma diagnosis
CN117457081A (en) Space transcriptome data processing method and system based on hypergraph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination