CN114494175A - Interactive space segmentation method for mass spectrum imaging data - Google Patents

Interactive space segmentation method for mass spectrum imaging data Download PDF

Info

Publication number
CN114494175A
CN114494175A CN202210072775.4A CN202210072775A CN114494175A CN 114494175 A CN114494175 A CN 114494175A CN 202210072775 A CN202210072775 A CN 202210072775A CN 114494175 A CN114494175 A CN 114494175A
Authority
CN
China
Prior art keywords
segmentation
data
msi
unsupervised
imaging data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210072775.4A
Other languages
Chinese (zh)
Other versions
CN114494175B (en
Inventor
董继扬
郭磊
胡振兴
徐向南
许晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202210072775.4A priority Critical patent/CN114494175B/en
Publication of CN114494175A publication Critical patent/CN114494175A/en
Application granted granted Critical
Publication of CN114494175B publication Critical patent/CN114494175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/64Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using wave or particle radiation to ionise a gas, e.g. in an ionisation chamber
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Toxicology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Evolutionary Biology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Pathology (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

An interactive space segmentation method for mass spectrum imaging data relates to the technical field of mass spectrum imaging data analysis. The method comprises the following steps: preprocessing the original data by spectral peak pair, spectral peak extraction, spectral peak combination and the like to obtain mass spectrum imaging data; unsupervised dimensionality reduction is carried out on the mass spectrum imaging data to obtain an embedded image; constructing an unsupervised segmentation model to pre-segment the embedded image; representing the prior knowledge of the user as doodle information to carry out regularization constraint on the segmentation model; and carrying out interactive fine adjustment in the region with unreasonable pre-segmentation result to obtain more reasonable segmentation result. Incomplete and inaccurate priori knowledge of a user can be fully utilized, and the reasonability of a mass spectrum imaging data segmentation result is improved; the problems that the result generated by an unsupervised segmentation model is unstable or is irrelevant to the research problem and the like are avoided, and the reliability and the accuracy of the subsequent analysis of the mass spectrum imaging data are improved.

Description

Interactive space segmentation method for mass spectrum imaging data
Technical Field
The invention relates to the technical field of Mass Spectrometry Imaging (MSI) data analysis, in particular to an interactive space segmentation method for mass spectrometry imaging data.
Background
The Mass Spectrometry Imaging (MSI) organically combines molecular information of a mass spectrometry technology and spatial information of an imaging technology, can perform qualitative, quantitative and positioning analysis on various biomolecules such as proteins, lipids, small molecule metabolites and the like in a biological sample in situ, has the advantages of no label, high flux, high sensitivity and the like, and is an important analysis tool for drug development, tumor heterogeneity analysis and disease progression related biomarker screening. The spatial resolution of currently advanced MSI instruments can reach 20 μm or even higher, each spectrum (pixel) contains 104-105Ion peaks. The MSI technology can provide massive biomolecule information, but the characteristics of high dimensionality, low signal-to-noise ratio, small sample size and the like of MSI data also bring huge challenges to subsequent biomedical analysis and interpretation.
Spatial segmentation is a key preprocessing step in MSI data analysis, and divides pixels with similar spectral features and spatial distribution in MSI data into the same cluster to form a spatial segmentation image, so that a basis is provided for identification of regions of interest (ROIs) of a sample, and a powerful tool is provided for visualization of high-dimensional MSI data. Accurate spatial segmentation is the key to subsequent analysis and biological interpretation of MSI data, and can provide theoretical basis for mining potential molecular patterns in biological samples.
The MSI space partitioning algorithm can be divided into two major categories, supervised and unsupervised. The supervised method utilizes known label data to construct and train an accurate classification model and carries out class prediction on the data of unknown labels. The segmentation result obtained by the supervised method is usually highly interpretable, but the accuracy and integrity of the label seriously affect the reliability of the segmentation result. However, in actual MSI data, there is often no tag data, or the tag data is incomplete and inaccurate, so the supervised method is greatly limited in MSI data analysis. On the contrary, the unsupervised method only uses the spectral feature and spatial distribution information of the pixel itself for segmentation, and does not need additional labels, so the unsupervised method is more practical in high-dimensional MSI data analysis, and is widely applied in practical research. For example, Abdelmola et al (w.m. abdelmoula et al, PNAS,2016) unsupervised spatial segmentation of MSI data using a combination of t-distribution random neighborhood embedding (t-SNE) descent and k-means clustering and found spatial heterogeneity of proteins in different microdomains of breast and gastric cancers in the segmentation results obtained.
However, MSI data are very complex, where the molecular signal intensity in each pixel is not only determined by the tissue sample itself, but is also influenced by other technical factors such as ionization efficiency and matrix effects. Thus, the segmentation results produced by unsupervised methods that lack bootstrapping often fail to provide an efficient representation of MSI data, mainly in two areas: in one aspect, the design of unsupervised methods is biased towards generating segmentation results that contain large fragment sub-regions. When the algorithm pursues global optimality, some small but important sub-regions in the tissue sample are often merged into one large region. Unsupervised methods, on the other hand, often capture some nonsense areas created by other technical factors, which are often unrelated to the biological problem under study. So that the application of the unsupervised method is greatly limited. Therefore, there is a strong need for a method for improving unsupervised segmentation results, which results in more stable segmentation results that are more relevant to research issues.
In actual research, compared with an unsupervised method without any assumption about prior knowledge of MSI data, there are many incomplete information about a sample, such as partial region boundaries, background noise, and the like, and effectively utilizing the incomplete prior knowledge to improve unsupervised results is a good solution, but few reports exist.
Disclosure of Invention
The invention aims to provide an interactive space segmentation method for mass spectrum imaging data, which is flexible and stable, can carry out friendly interaction with a user through a doodling mode, and can effectively solve the problems of unstable results, irrelevant segmentation results to research and the like of the existing unsupervised segmentation method.
The invention comprises the following steps:
1) sample preparation and mass spectrometry imaging data acquisition: performing frozen section processing on a biological sample to be detected to obtain a histological section, performing matrix assisted laser desorption ionization (MALDI-TOF) experiment on the obtained histological section, and collecting original data of Mass Spectrum Imaging (MSI);
2) data preprocessing: exporting MSI original data acquired in the step 1) from an instrument, and performing data preprocessing to remove matrix noise introduced in the sample preparation and data acquisition processes to obtain high-dimensional MSI data X;
3) and (3) data dimension reduction: mapping the MSI data X of high dimension to a low dimension embedding space by using a dimension reduction algorithm and adopting an unsupervised dimension reduction method to obtain an embedded image E;
4) unsupervised segmentation: segmentation model for constructing deep neural network
Figure BDA0003482870530000021
And using the embedded image E obtained in the step 3) to divide the model
Figure BDA0003482870530000022
The parameters are subjected to unsupervised training to obtain a pre-segmentation result of the embedded image E
Figure BDA0003482870530000027
5) And (3) carrying out model regularization fine tuning: according to the priori knowledge of the user, creating graffiti information in the area with unreasonable pre-segmentation result to obtain graffiti image
Figure BDA0003482870530000024
Will scribble information
Figure BDA0003482870530000025
Transforming into regularized constraint term, and segmenting into models
Figure BDA0003482870530000026
The parameters of (a) are fine-tuned to obtain an improved spatial segmentation result.
In step 1), the sample preparation and mass spectrometry imaging data acquisition steps may be:
(1) freezing and slicing a sample to be tested by adopting a direct freezing slicing method, thawing, fixing the slice on a conductive glass slide coated with cold Indium Tin Oxide (ITO), and storing in a refrigerator at the temperature of-80 ℃ until MSI analysis experiment;
(2) during the MSI analysis experiment, the glass slide is taken out and placed in a vacuum chamber for 30min for unfreezing and dehydration, after dehydration, a matrix of N- (1-naphthyl) -ethylenediamine dihydrochloride (NEDC) is uniformly sprayed on the surface of the slice, and a MALDI-TOF mass spectrometer is used for data acquisition.
In the step 2), the data preprocessing comprises processing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like on the data, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent; deleting a small probability spectrum peak in the pixel to obtain a high-dimensional MSI data set X;
the data preprocessing can be specifically performed as follows:
(1) carrying out format conversion on original MSI data by using an MSI instrument with software SCiLs Lab, and exporting the MSI data as an imzML format file;
(2) and preprocessing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like is carried out after the imzML format file is read, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent.
(3) And deleting ion peaks with the frequency less than 5% in the pixels by using a self-compiled Python script to obtain a preprocessed high-dimensional MSI data set X.
In step 3), the dimensionality reduction algorithm adopts an unsupervised dimensionality reduction method, including uniform manifold approximation, projection (UMAP), and the like.
In step 4), the specific steps of the unsupervised segmentation may be:
(1) construction of segmentation model based on deep neural network
Figure BDA0003482870530000031
And designing a corresponding loss function
Figure BDA0003482870530000032
The pixel points with similar characteristics and adjacent spaces are divided into the same cluster as much as possible, and the cluster number of the division results is as much as possible;
(2) using the embedded image E as a training sample, and performing segmentation on the model
Figure BDA0003482870530000033
Pre-training to obtain a response matrix R;
(3) taking the channel index of the maximum value of each pixel feature in the response matrix R as the label of the pixel to obtain a pre-segmentation result
Figure BDA0003482870530000034
One label on the segmentation result corresponds to one cluster.
In step 5), the specific steps of the model regularization fine tuning may be:
(1) by dividing the result
Figure BDA0003482870530000035
Creating a blank graffiti image for the template, the user segmenting the results based on his prior knowledge
Figure BDA0003482870530000036
Creating graffiti information at corresponding positions of unreasonable clusters to obtain graffiti images
Figure BDA0003482870530000037
(2) Will scribble the image
Figure BDA0003482870530000038
Converting into regularization constraint term, adding into the pre-trained segmentation model in the step 4)
Figure BDA0003482870530000039
In the loss function of (2), the pixels with the same graffiti marks are divided into the same cluster as much as possible, and the pixels with different graffiti marks are divided into different clusters respectively;
(3) segmentation model with regularization constraint after pre-training by using embedded image E
Figure BDA00034828705300000310
And fine-tuning the parameters to obtain an improved segmentation result.
The invention provides an interactive MSI space segmentation strategy, which is an unsupervised segmentation model based on two-stage learning. Incomplete and inaccurate prior knowledge is converted into doodle information, loss functions of the segmentation models are subjected to regularization processing, and effectiveness of unsupervised model segmentation results can be greatly improved. The method is flexible and stable, friendly interaction is carried out with a user through a doodling mode, and the problems that the existing unsupervised segmentation method is unstable in result, irrelevant to the segmentation result and the like can be effectively solved. The present invention is expected to facilitate the wide application of MSI technology.
Compared with the prior art, the invention has the beneficial effects that:
1) by introducing a novel regularization strategy, interactive segmentation of MSI data is realized, and a more reasonable segmentation result is obtained;
2) inaccurate and incomplete prior knowledge is expressed as doodle information and is converted into regularization constraint of a segmentation model, and the effectiveness of a model segmentation result is improved;
3) by utilizing transfer learning, a user can interact with the model for multiple times, and more flexible space segmentation is realized.
Drawings
FIG. 1 is a system block diagram of an embodiment of the invention.
FIG. 2 is a diagram illustrating the effect of the interactive segmentation performed in the present invention.
Detailed Description
In order to make the technical advantages of the present invention more clear, the following embodiments will further describe the present invention with reference to the accompanying drawings.
The invention provides an interactive segmentation method of mass spectrum imaging data, which comprises the following steps:
s1: performing frozen section processing on a biological sample to be detected to obtain a histological section, performing matrix assisted laser desorption ionization (MALDI-TOF) experiment on the obtained histological section, and collecting MSI original data;
s2: exporting MSI original data acquired by S1 from an instrument, performing data preprocessing such as spectral peak extraction, spectral peak identification and spectral peak combination, and removing matrix noise and the like introduced in S1 sample preparation and data acquisition to obtain a high-dimensional MSI data set for subsequent data processing;
s3: using a dimension reduction algorithm, such as: uniform Manifold Approximation and Projection (UMAP), projecting the MSI data set of high dimension into the low dimension embedding space to obtain the low dimension embedding image;
s4: constructing an unsupervised segmentation model based on a deep neural network, and pre-training the segmentation model by using the embedded image obtained in S3 to obtain a pre-segmentation result;
s5: and creating graffiti information on an area with unreasonable pre-segmentation results according to the prior knowledge of a user, converting the graffiti information into a regularization constraint term, and performing parameter fine adjustment on the segmentation model to obtain an improved segmentation result.
Further, the sample preparation and data collection in S1 are as follows:
a1: slicing a sample to be tested by adopting a direct freezing slicing method, unfreezing and fixing the sample on an ITO coated conductive glass slide, and storing the sample in a refrigerator at the temperature of 80 ℃ below zero until an MSI analysis experiment is carried out;
a2: during the MSI analysis experiment, the glass slide is taken out and placed in a vacuum chamber for 30min to be defrosted and dehydrated, then a matrix of N- (1-naphthyl) -ethylenediamine dihydrochloride (NEDC) is uniformly sprayed on the surface of the slice, and a MALDI-TOF mass spectrometer is used for data acquisition.
Further, the data preprocessing in S2 is as follows:
a1: carrying out format conversion on original MSI data by using an MSI instrument with software SCiLs Lab, and exporting the MSI data as an imzML format file;
a2: and preprocessing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like is carried out after the imzML format file is read, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent.
A3: and deleting ion peaks with the frequency less than 5% in the pixels by using a self-compiled Python script to obtain a preprocessed high-dimensional MSI data set.
Further, the process of unsupervised spatial segmentation in S4 is as follows:
a1: constructing a segmentation model based on a deep neural network, and designing a corresponding loss function, so that pixel points with similar characteristics and adjacent spaces are divided into the same cluster as much as possible, and the cluster number of segmentation results is as much as possible;
a2: pre-training the segmentation model by taking the embedded image as a training sample to obtain a response matrix;
a3: taking the channel index where the maximum value of each pixel feature in the response matrix is as the label of the pixel to obtain a pre-segmentation result, wherein one label on the segmentation result corresponds to one cluster;
further, the parameter fine-tuning process of the segmentation model by the S5 is as follows:
a1: creating a blank scrawling image by taking the segmentation result as a template, and creating scrawling information on the corresponding position of the unreasonable cluster of the segmentation result by a user according to the priori knowledge of the user to obtain the scrawling image;
a2: converting the graffiti information into a regularization constraint term, and adding the regularization constraint term into a loss function of a pre-trained segmentation model in S4, so that pixels with the same graffiti mark are divided into the same cluster as much as possible, and pixels with different graffiti marks are divided into different clusters respectively;
a3: and carrying out parameter fine adjustment on the pre-training model by utilizing the embedded image to obtain an improved segmentation result.
Specific examples are given below.
FIG. 1 is a system block diagram of an embodiment of the invention. Wherein, X is a preprocessed high-dimensional MSI data set, E is a low-dimensional embedded image,
Figure BDA0003482870530000051
for scribble information, R is a multi-channel response matrix,
Figure BDA0003482870530000052
is the result of the segmentation.
The embodiment comprises the following steps:
1. sample preparation and data acquisition:
(1) the tissue sample used in the present invention is a mouse cultured in a certain institute. After mice were sacrificed, specimens were snap frozen in liquid nitrogen and placed in a refrigerator at-80 ℃. The specimens were sectioned using the instrument CryoStar Nx79(Thermo Fisher Scientific, Germany) to give frozen sections 10 μm thick.
(2) After thawing the specimens, they were placed on cold Indium Tin Oxide (ITO) coated conductive slides and dehydrated in a vacuum chamber for 30 min. After dehydration, N- (1-naphthyl) -ethylenediamine dihydrochloride (NEDC) is taken as a matrix and is uniformly sprayed on the surface of the slice. Data acquisition was performed in negative ion mode using Rapiflex MALDI tissue (Bruker Daltonics, Germany) and the mass-to-charge ratio (m/z) range of the ion peak to be acquired was set to 250-.
2. Preprocessing the data and deriving into a high-dimensional MSI data set X:
(1) the collected raw data is converted into data in an imzML format by using SCiLs Lab software carried by an MSI instrument.
(2) Reading the data file in the imzML format by using a public R packet 'MALDIquant', and performing pretreatment such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like. And exporting the preprocessed MSI data into csv files, wherein each file corresponds to one pixel in the original data and comprises the spectral peak position and the corresponding intensity acquired by the instrument. The specific processes of spectral peak alignment, spectral peak extraction and spectral peak merging are as follows:
alignment of spectral peaks: the ion spectral peak of each pixel in the MSI data is aligned with the reference spectral peak, correcting for random mass-to-charge ratio drift of the same ion spectral peak in different pixels. Wherein the reference spectral peak is selected as the spectral peak having the highest correlation with the other spectral peaks.
And (3) extraction of a spectrum peak: the ion spectral peak for each pixel in the MSI data is converted to a list containing only the mass-to-charge ratio and corresponding intensity at the location of the spectral peak. Among these, the spectral peaks identified must satisfy two conditions: one is that the peak needs to be a local maximum within a given window (size 20 ppm); secondly, the intensity of the spectrum peak needs to be more than twice of the noise peak. Wherein the noise peak intensity is determined by Median Absolute Deviation (MAD) estimation.
And (3) spectrum peak combination: the ion spectral peaks of each pixel in the MSI data over a range of mass-to-charge ratios are combined and the spectral peak intensity is set to the sum of the intensities of all ion spectral peaks in that range.
(3) And merging the exported csv files by utilizing a written Python script, counting the occurrence frequency of each ion peak in all pixels, and deleting the ion peaks with the occurrence frequency less than 5% to obtain a preprocessed MSI data set X.
3. And mapping the preprocessed high-dimensional MSI data set X into a low-dimensional space by using an unsupervised dimension reduction algorithm UMAP to obtain a low-dimensional embedded image E of the MSI data.
4. Unsupervised segmentation model using low-dimensional embedded image E
Figure BDA0003482870530000061
Pre-training is carried out to obtain a pre-segmentation result
Figure BDA0003482870530000062
(1) Construction of a segmentation model of a deep neural network comprising N layers of CNN units
Figure BDA0003482870530000063
Wherein, the first N-1 CNN modules comprise 1 2D CNN layer with convolution kernel of 3 × 3, 1 ReLU activation layer and 1 Batch Normalization (BN) layer, and the last CNN module comprises 1 2D CNN layer with convolution kernel of 1 × 1 and 1 BN layer.
(2) Training segmentation model with low-dimensional embedded image E
Figure BDA0003482870530000071
Obtaining a multi-channel response matrix R, then taking the index of the channel where the maximum value of the response matrix R is as the clustering label of each pixel, and obtaining a pre-segmentation result
Figure BDA0003482870530000072
(3) Loss function of segmentation model
Figure BDA0003482870530000073
The design is as follows:
Figure BDA0003482870530000074
wherein, ω is1,ω2,ω3Are respectively
Figure BDA0003482870530000075
And
Figure BDA0003482870530000076
the weight of (a) is determined,
Figure BDA0003482870530000077
and
Figure BDA0003482870530000078
respectively as follows:
Figure BDA0003482870530000079
Figure BDA00034828705300000710
Figure BDA00034828705300000711
wherein a and b are respectively expressed as the abscissa and ordinate of the pixel in the MSI data, and i is expressed asChannel of the pixel, H ═ Ca,b,i)A×BIs a one-hot encoding of the initial segmentation result. Loss function
Figure BDA00034828705300000712
Pixels with similar characteristics are divided into the same cluster as much as possible,
Figure BDA00034828705300000713
so that spatially adjacent pixels are divided into the same cluster as much as possible
Figure BDA00034828705300000714
The number of regions to be segmented should be as large as possible.
(4) Segmentation model
Figure BDA00034828705300000715
The pre-training of (1) adopts a random gradient descent (SGD) optimizer, calculates and propagates the parameters of the updated model in reverse at a learning rate of 0.01 and a momentum of 0.9.
5. Graffiti information created with user interaction
Figure BDA00034828705300000716
As regularization constraint term, to segmentation model
Figure BDA00034828705300000717
Fine tuning parameters to obtain an improved segmentation result:
(1) by pre-dividing the result
Figure BDA00034828705300000718
A blank graffiti image is created for the template. Pre-segmenting results based on a priori knowledge of a user
Figure BDA00034828705300000719
Creating graffiti on positions corresponding to unreasonable clusters to obtain graffiti images
Figure BDA00034828705300000720
Wherein a graffiti is a set of continuous or discrete pixels having a defined class.
(2) Will scribble the image
Figure BDA00034828705300000721
Converted into regularization terms and added to the segmentation model
Figure BDA00034828705300000722
Loss function of
Figure BDA00034828705300000723
And in the middle, performing parameter fine adjustment on the model to obtain a segmentation result after graffiti interaction.
(3) Regularization item of doodle information in training process
Figure BDA00034828705300000724
The design is as follows:
Figure BDA00034828705300000725
wherein G ═ Gab,i)A×B×qIs a graffiti image
Figure BDA00034828705300000726
The one-hot encoding of (1). The regularization constraint term will make pixels with the same graffiti marking as partitioned into the same cluster as possible.
Adding the regularization term to a loss function of the model
Figure BDA0003482870530000081
In (1), namely:
Figure BDA0003482870530000082
wherein, ω is1,ω2,ω3
Figure BDA0003482870530000083
And
Figure BDA0003482870530000084
the same as formula (1); omega4As a regularization term
Figure BDA0003482870530000085
The weight of (c); if the pixel is contained in a graffiti pixel, then k is 1, otherwise k is 0. And (5) carrying out parameter fine adjustment on the segmentation model by using a new loss function formula (6).
(4) Segmentation model
Figure BDA0003482870530000086
The fine tuning of the model is carried out by adopting an SGD optimizer to carry out back propagation on the basis of the pre-trained parameters, and the parameters of the model are updated by calculating the learning rate of 0.001 and the momentum of 0.9 and carrying out back propagation.
According to an embodiment of the invention, MSI data of a mouse fetus are interactively segmented. Since mouse fetal samples contain multiple organs and the MSI data is very complex, spatial partitioning of MSI data for mouse fetuses is a difficult task. FIG. 2 shows the segmentation results of an unsupervised segmentation model without a doodling constraint. It can be seen that there are partial organs and sub-organs that are not accurately divided, such as: hippocampus, midbrain and brainstem, tongue and heart, etc., in the brain region. And the segmentation result of the model is gradually improved through a plurality of scrawling interactions, and the unreasonable segmentation areas are greatly improved. The experimental results demonstrate the effectiveness of the interactive segmentation method based on graffiti regularization provided by the present invention.
In summary, the present invention develops a novel strategy for spatial segmentation of MSI data, which allows a user to convert a priori knowledge into doodle information, regularize constraints on a model, and improve the rationality of the model segmentation results. The proposed method comprises two phases: 1) and pre-training an unsupervised segmentation model, and pre-training the model by using the MSI low-dimensional embedded image to obtain a primary segmentation result. 2) And defining the doodle information by the user according to the prior knowledge, and performing parameter fine adjustment on the segmentation model as a regularization term to obtain an improved segmentation result. The method and the device design an effective loss function, and ensure that the stability and the segmentation performance of the DNN model can be improved by introducing the doodle regularization. The results of the examples demonstrate that the method enables a fine segmentation of organs and sub-organs in complex MSI data of mouse fetuses, leading to segmentation results that are more relevant to the study problem. The interactive segmentation method developed by the invention can become a powerful tool for biomedical research and is expected to be widely applied to other hyperspectral imaging technologies.

Claims (7)

1. A method of interactive spatial segmentation of mass spectrometry imaging data, comprising the steps of:
1) sample preparation and mass spectrometry imaging data acquisition: freezing and slicing a biological sample to be detected to obtain a histological slice, performing a matrix-assisted laser desorption ionization experiment on the obtained histological slice, and collecting MSI original data;
2) data preprocessing: exporting MSI original data acquired in the step 1) from an instrument, and performing data preprocessing to remove matrix noise introduced in the sample preparation and data acquisition processes to obtain high-dimensional MSI data X;
3) and (3) data dimension reduction: mapping the MSI data X of high dimension to a low dimension embedding space by using a dimension reduction algorithm and adopting an unsupervised dimension reduction method to obtain an embedded image E;
4) unsupervised segmentation: segmentation model for constructing deep neural network
Figure FDA0003482870520000011
And using the embedded image E obtained in the step 3) to divide the model
Figure FDA0003482870520000012
Performing unsupervised training on the parameters to obtain a pre-segmentation result of the embedded image E
Figure FDA0003482870520000013
5) Model regularized fine tuning: according to the priori knowledge of the user, creating graffiti information in the area with unreasonable pre-segmentation result to obtain graffiti image
Figure FDA0003482870520000014
Will scribble information
Figure FDA0003482870520000015
Transforming into regularized constraint term, and segmenting into models
Figure FDA0003482870520000016
The parameters of (a) are fine-tuned to obtain an improved spatial segmentation result.
2. The method of claim 1, wherein in step 1), the sample preparation and the acquisition of the mass spectrometry imaging data comprise the following steps:
(1) freezing and slicing a sample to be tested by adopting a direct freezing slicing method, thawing, fixing the slice on a conductive glass slide coated with cold Indium Tin Oxide (ITO), and storing in a refrigerator at the temperature of-80 ℃ until MSI analysis experiment;
(2) and taking out the glass slide during the MSI analysis experiment, placing the glass slide in a vacuum chamber for 30min for unfreezing and dehydrating, uniformly spraying a matrix of N- (1-naphthyl) -ethylenediamine dihydrochloride on the surface of the slice after dehydration, and collecting data by using a MALDI-TOF mass spectrometer.
3. The method of claim 1, wherein in step 2), the data preprocessing comprises performing peak alignment, peak extraction and peak combination on the data to make the mass-to-charge ratio coordinates of the peaks of all pixels consistent; the small probability spectral peaks in the pixels are deleted to obtain a high dimensional MSI dataset X.
4. The method of claim 1, wherein in step 2), the data preprocessing comprises the following steps:
(1) carrying out format conversion on original MSI data by using an MSI instrument with software SCiLs Lab, and exporting the MSI data as an imzML format file;
(2) preprocessing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like is carried out after the imzML format file is read, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent;
(3) and deleting ion peaks with the frequency less than 5% in the pixels by using a self-compiled Python script to obtain a preprocessed high-dimensional MSI data set X.
5. The method of claim 1, wherein in step 3), the dimensionality reduction algorithm employs an unsupervised dimensionality reduction method comprising homogeneous manifold approximation, projection.
6. The interactive spatial segmentation method for mass spectrometry imaging data of claim 1, wherein in step 4), the unsupervised segmentation specifically comprises the following steps:
(1) construction of segmentation model based on deep neural network
Figure FDA0003482870520000021
And designing a corresponding loss function
Figure FDA0003482870520000022
The pixel points with similar characteristics and adjacent spaces are divided into the same cluster as much as possible, and the cluster number of the division results is as much as possible;
(2) using the embedded image E as a training sample, and performing segmentation on the model
Figure FDA0003482870520000023
Pre-training to obtain a response matrix R;
(3) taking the channel index of the maximum value of each pixel feature in the response matrix R as the label of the pixel to obtain a pre-segmentation result
Figure FDA0003482870520000024
One label on the segmentation result corresponds to one cluster.
7. The method of claim 1, wherein in step 5), the step of regularizing and fine-tuning the model comprises:
(1) by dividing the result
Figure FDA0003482870520000025
Creating a blank graffiti image for the template, the user segmenting the results based on his prior knowledge
Figure FDA0003482870520000026
Creating graffiti information at corresponding positions of unreasonable clusters to obtain graffiti images
Figure FDA0003482870520000027
(2) Will scribble the image
Figure FDA0003482870520000028
Converting into regularization constraint term, adding into the pre-trained segmentation model in the step 4)
Figure FDA0003482870520000029
In the loss function of (2), the pixels with the same graffiti marks are divided into the same cluster as much as possible, and the pixels with different graffiti marks are divided into different clusters respectively;
(3) segmentation model with regularization constraint after pre-training by using embedded image E
Figure FDA00034828705200000210
And fine-tuning the parameters to obtain an improved segmentation result.
CN202210072775.4A 2022-01-21 2022-01-21 Interactive space segmentation method for mass spectrum imaging data Active CN114494175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210072775.4A CN114494175B (en) 2022-01-21 2022-01-21 Interactive space segmentation method for mass spectrum imaging data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210072775.4A CN114494175B (en) 2022-01-21 2022-01-21 Interactive space segmentation method for mass spectrum imaging data

Publications (2)

Publication Number Publication Date
CN114494175A true CN114494175A (en) 2022-05-13
CN114494175B CN114494175B (en) 2024-05-03

Family

ID=81472895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210072775.4A Active CN114494175B (en) 2022-01-21 2022-01-21 Interactive space segmentation method for mass spectrum imaging data

Country Status (1)

Country Link
CN (1) CN114494175B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860612A (en) * 2020-06-29 2020-10-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Unsupervised hyperspectral image hidden low-rank projection learning feature extraction method
AU2020103905A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning
CN113643275A (en) * 2021-08-29 2021-11-12 浙江工业大学 Ultrasonic defect detection method based on unsupervised manifold segmentation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860612A (en) * 2020-06-29 2020-10-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Unsupervised hyperspectral image hidden low-rank projection learning feature extraction method
AU2020103905A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning
CN113643275A (en) * 2021-08-29 2021-11-12 浙江工业大学 Ultrasonic defect detection method based on unsupervised manifold segmentation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘汉强;赵静;: "基于半监督的超像素谱聚类彩色图像分割算法", 计算机工程与应用, no. 14, 11 August 2017 (2017-08-11), pages 191 - 195 *

Also Published As

Publication number Publication date
CN114494175B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN111860612B (en) Unsupervised hyperspectral image hidden low-rank projection learning feature extraction method
US8275185B2 (en) Discover biological features using composite images
Vu et al. An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data
CN102693299B (en) System and method for parallel video copy detection
Dowsey et al. Image analysis tools and emerging algorithms for expression proteomics
CN113785362A (en) Automatic detection of boundaries in mass spectrometry data
Hu et al. Emerging computational methods in mass spectrometry imaging
Zhang et al. Spatially aware clustering of ion images in mass spectrometry imaging data using deep learning
CN102122386A (en) SAR (stop and reveres) image segmentation method based on dictionary migration clustering
Nanni et al. General purpose (GenP) bioimage ensemble of handcrafted and learned features with data augmentation
Huang et al. Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition
Fatima et al. A new texture and shape based technique for improving meningioma classification
Liu et al. Function-on-scalar quantile regression with application to mass spectrometry proteomics data
CN106199544B (en) Differentiate the Recognition of Radar Target Using Range Profiles method of local tangent space alignment based on core
CN114494175B (en) Interactive space segmentation method for mass spectrum imaging data
Castro et al. Probe‐based mass spectrometry approaches for single‐cell and single‐organelle measurements
CN109946413B (en) method for detecting proteome by pulse type data independent acquisition mass spectrum
Li et al. Parallel computation for blood cell classification in medical hyperspectral imagery
Permiakova et al. CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
Li et al. LR‐RoadNet: A long‐range context‐aware neural network for road extraction via high‐resolution remote sensing images
CN112818831B (en) Hyperspectral image classification algorithm based on band clustering and improved domain transformation recursive filtering
Esfahani et al. Deep learning based event reconstruction for cyclotron radiation emission spectroscopy
Li et al. Hyperspectral image processing: Methods and approaches
CN116778352A (en) Radar target identification method based on self-supervision model transfer learning
Laskin Spatial Segmentation of

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant