CN114494175A

CN114494175A - Interactive space segmentation method for mass spectrum imaging data

Info

Publication number: CN114494175A
Application number: CN202210072775.4A
Authority: CN
Inventors: 董继扬; 郭磊; 胡振兴; 徐向南; 许晶晶
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-05-13
Anticipated expiration: 2042-01-21
Also published as: CN114494175B

Abstract

An interactive space segmentation method for mass spectrum imaging data relates to the technical field of mass spectrum imaging data analysis. The method comprises the following steps: preprocessing the original data by spectral peak pair, spectral peak extraction, spectral peak combination and the like to obtain mass spectrum imaging data; unsupervised dimensionality reduction is carried out on the mass spectrum imaging data to obtain an embedded image; constructing an unsupervised segmentation model to pre-segment the embedded image; representing the prior knowledge of the user as doodle information to carry out regularization constraint on the segmentation model; and carrying out interactive fine adjustment in the region with unreasonable pre-segmentation result to obtain more reasonable segmentation result. Incomplete and inaccurate priori knowledge of a user can be fully utilized, and the reasonability of a mass spectrum imaging data segmentation result is improved; the problems that the result generated by an unsupervised segmentation model is unstable or is irrelevant to the research problem and the like are avoided, and the reliability and the accuracy of the subsequent analysis of the mass spectrum imaging data are improved.

Description

Interactive space segmentation method for mass spectrum imaging data

Technical Field

The invention relates to the technical field of Mass Spectrometry Imaging (MSI) data analysis, in particular to an interactive space segmentation method for mass spectrometry imaging data.

Background

The Mass Spectrometry Imaging (MSI) organically combines molecular information of a mass spectrometry technology and spatial information of an imaging technology, can perform qualitative, quantitative and positioning analysis on various biomolecules such as proteins, lipids, small molecule metabolites and the like in a biological sample in situ, has the advantages of no label, high flux, high sensitivity and the like, and is an important analysis tool for drug development, tumor heterogeneity analysis and disease progression related biomarker screening. The spatial resolution of currently advanced MSI instruments can reach 20 μm or even higher, each spectrum (pixel) contains 10⁴-10⁵Ion peaks. The MSI technology can provide massive biomolecule information, but the characteristics of high dimensionality, low signal-to-noise ratio, small sample size and the like of MSI data also bring huge challenges to subsequent biomedical analysis and interpretation.

Spatial segmentation is a key preprocessing step in MSI data analysis, and divides pixels with similar spectral features and spatial distribution in MSI data into the same cluster to form a spatial segmentation image, so that a basis is provided for identification of regions of interest (ROIs) of a sample, and a powerful tool is provided for visualization of high-dimensional MSI data. Accurate spatial segmentation is the key to subsequent analysis and biological interpretation of MSI data, and can provide theoretical basis for mining potential molecular patterns in biological samples.

The MSI space partitioning algorithm can be divided into two major categories, supervised and unsupervised. The supervised method utilizes known label data to construct and train an accurate classification model and carries out class prediction on the data of unknown labels. The segmentation result obtained by the supervised method is usually highly interpretable, but the accuracy and integrity of the label seriously affect the reliability of the segmentation result. However, in actual MSI data, there is often no tag data, or the tag data is incomplete and inaccurate, so the supervised method is greatly limited in MSI data analysis. On the contrary, the unsupervised method only uses the spectral feature and spatial distribution information of the pixel itself for segmentation, and does not need additional labels, so the unsupervised method is more practical in high-dimensional MSI data analysis, and is widely applied in practical research. For example, Abdelmola et al (w.m. abdelmoula et al, PNAS,2016) unsupervised spatial segmentation of MSI data using a combination of t-distribution random neighborhood embedding (t-SNE) descent and k-means clustering and found spatial heterogeneity of proteins in different microdomains of breast and gastric cancers in the segmentation results obtained.

However, MSI data are very complex, where the molecular signal intensity in each pixel is not only determined by the tissue sample itself, but is also influenced by other technical factors such as ionization efficiency and matrix effects. Thus, the segmentation results produced by unsupervised methods that lack bootstrapping often fail to provide an efficient representation of MSI data, mainly in two areas: in one aspect, the design of unsupervised methods is biased towards generating segmentation results that contain large fragment sub-regions. When the algorithm pursues global optimality, some small but important sub-regions in the tissue sample are often merged into one large region. Unsupervised methods, on the other hand, often capture some nonsense areas created by other technical factors, which are often unrelated to the biological problem under study. So that the application of the unsupervised method is greatly limited. Therefore, there is a strong need for a method for improving unsupervised segmentation results, which results in more stable segmentation results that are more relevant to research issues.

In actual research, compared with an unsupervised method without any assumption about prior knowledge of MSI data, there are many incomplete information about a sample, such as partial region boundaries, background noise, and the like, and effectively utilizing the incomplete prior knowledge to improve unsupervised results is a good solution, but few reports exist.

Disclosure of Invention

The invention aims to provide an interactive space segmentation method for mass spectrum imaging data, which is flexible and stable, can carry out friendly interaction with a user through a doodling mode, and can effectively solve the problems of unstable results, irrelevant segmentation results to research and the like of the existing unsupervised segmentation method.

The invention comprises the following steps:

1) sample preparation and mass spectrometry imaging data acquisition: performing frozen section processing on a biological sample to be detected to obtain a histological section, performing matrix assisted laser desorption ionization (MALDI-TOF) experiment on the obtained histological section, and collecting original data of Mass Spectrum Imaging (MSI);

2) data preprocessing: exporting MSI original data acquired in the step 1) from an instrument, and performing data preprocessing to remove matrix noise introduced in the sample preparation and data acquisition processes to obtain high-dimensional MSI data X;

3) and (3) data dimension reduction: mapping the MSI data X of high dimension to a low dimension embedding space by using a dimension reduction algorithm and adopting an unsupervised dimension reduction method to obtain an embedded image E;

4) unsupervised segmentation: segmentation model for constructing deep neural network

And using the embedded image E obtained in the step 3) to divide the model

The parameters are subjected to unsupervised training to obtain a pre-segmentation result of the embedded image E

5) And (3) carrying out model regularization fine tuning: according to the priori knowledge of the user, creating graffiti information in the area with unreasonable pre-segmentation result to obtain graffiti image

Will scribble information

Transforming into regularized constraint term, and segmenting into models

The parameters of (a) are fine-tuned to obtain an improved spatial segmentation result.

In step 1), the sample preparation and mass spectrometry imaging data acquisition steps may be:

(1) freezing and slicing a sample to be tested by adopting a direct freezing slicing method, thawing, fixing the slice on a conductive glass slide coated with cold Indium Tin Oxide (ITO), and storing in a refrigerator at the temperature of-80 ℃ until MSI analysis experiment;

(2) during the MSI analysis experiment, the glass slide is taken out and placed in a vacuum chamber for 30min for unfreezing and dehydration, after dehydration, a matrix of N- (1-naphthyl) -ethylenediamine dihydrochloride (NEDC) is uniformly sprayed on the surface of the slice, and a MALDI-TOF mass spectrometer is used for data acquisition.

In the step 2), the data preprocessing comprises processing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like on the data, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent; deleting a small probability spectrum peak in the pixel to obtain a high-dimensional MSI data set X;

the data preprocessing can be specifically performed as follows:

(1) carrying out format conversion on original MSI data by using an MSI instrument with software SCiLs Lab, and exporting the MSI data as an imzML format file;

(2) and preprocessing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like is carried out after the imzML format file is read, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent.

(3) And deleting ion peaks with the frequency less than 5% in the pixels by using a self-compiled Python script to obtain a preprocessed high-dimensional MSI data set X.

In step 3), the dimensionality reduction algorithm adopts an unsupervised dimensionality reduction method, including uniform manifold approximation, projection (UMAP), and the like.

In step 4), the specific steps of the unsupervised segmentation may be:

(1) construction of segmentation model based on deep neural network

And designing a corresponding loss function

The pixel points with similar characteristics and adjacent spaces are divided into the same cluster as much as possible, and the cluster number of the division results is as much as possible;

(2) using the embedded image E as a training sample, and performing segmentation on the model

Pre-training to obtain a response matrix R;

(3) taking the channel index of the maximum value of each pixel feature in the response matrix R as the label of the pixel to obtain a pre-segmentation result

One label on the segmentation result corresponds to one cluster.

In step 5), the specific steps of the model regularization fine tuning may be:

(1) by dividing the result

Creating a blank graffiti image for the template, the user segmenting the results based on his prior knowledge

Creating graffiti information at corresponding positions of unreasonable clusters to obtain graffiti images

(2) Will scribble the image

Converting into regularization constraint term, adding into the pre-trained segmentation model in the step 4)

In the loss function of (2), the pixels with the same graffiti marks are divided into the same cluster as much as possible, and the pixels with different graffiti marks are divided into different clusters respectively;

(3) segmentation model with regularization constraint after pre-training by using embedded image E

And fine-tuning the parameters to obtain an improved segmentation result.

The invention provides an interactive MSI space segmentation strategy, which is an unsupervised segmentation model based on two-stage learning. Incomplete and inaccurate prior knowledge is converted into doodle information, loss functions of the segmentation models are subjected to regularization processing, and effectiveness of unsupervised model segmentation results can be greatly improved. The method is flexible and stable, friendly interaction is carried out with a user through a doodling mode, and the problems that the existing unsupervised segmentation method is unstable in result, irrelevant to the segmentation result and the like can be effectively solved. The present invention is expected to facilitate the wide application of MSI technology.

Compared with the prior art, the invention has the beneficial effects that:

1) by introducing a novel regularization strategy, interactive segmentation of MSI data is realized, and a more reasonable segmentation result is obtained;

2) inaccurate and incomplete prior knowledge is expressed as doodle information and is converted into regularization constraint of a segmentation model, and the effectiveness of a model segmentation result is improved;

3) by utilizing transfer learning, a user can interact with the model for multiple times, and more flexible space segmentation is realized.

Drawings

FIG. 1 is a system block diagram of an embodiment of the invention.

FIG. 2 is a diagram illustrating the effect of the interactive segmentation performed in the present invention.

Detailed Description

In order to make the technical advantages of the present invention more clear, the following embodiments will further describe the present invention with reference to the accompanying drawings.

The invention provides an interactive segmentation method of mass spectrum imaging data, which comprises the following steps:

s1: performing frozen section processing on a biological sample to be detected to obtain a histological section, performing matrix assisted laser desorption ionization (MALDI-TOF) experiment on the obtained histological section, and collecting MSI original data;

s2: exporting MSI original data acquired by S1 from an instrument, performing data preprocessing such as spectral peak extraction, spectral peak identification and spectral peak combination, and removing matrix noise and the like introduced in S1 sample preparation and data acquisition to obtain a high-dimensional MSI data set for subsequent data processing;

s3: using a dimension reduction algorithm, such as: uniform Manifold Approximation and Projection (UMAP), projecting the MSI data set of high dimension into the low dimension embedding space to obtain the low dimension embedding image;

s4: constructing an unsupervised segmentation model based on a deep neural network, and pre-training the segmentation model by using the embedded image obtained in S3 to obtain a pre-segmentation result;

s5: and creating graffiti information on an area with unreasonable pre-segmentation results according to the prior knowledge of a user, converting the graffiti information into a regularization constraint term, and performing parameter fine adjustment on the segmentation model to obtain an improved segmentation result.

Further, the sample preparation and data collection in S1 are as follows:

a1: slicing a sample to be tested by adopting a direct freezing slicing method, unfreezing and fixing the sample on an ITO coated conductive glass slide, and storing the sample in a refrigerator at the temperature of 80 ℃ below zero until an MSI analysis experiment is carried out;

a2: during the MSI analysis experiment, the glass slide is taken out and placed in a vacuum chamber for 30min to be defrosted and dehydrated, then a matrix of N- (1-naphthyl) -ethylenediamine dihydrochloride (NEDC) is uniformly sprayed on the surface of the slice, and a MALDI-TOF mass spectrometer is used for data acquisition.

Further, the data preprocessing in S2 is as follows:

a1: carrying out format conversion on original MSI data by using an MSI instrument with software SCiLs Lab, and exporting the MSI data as an imzML format file;

a2: and preprocessing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like is carried out after the imzML format file is read, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent.

A3: and deleting ion peaks with the frequency less than 5% in the pixels by using a self-compiled Python script to obtain a preprocessed high-dimensional MSI data set.

Further, the process of unsupervised spatial segmentation in S4 is as follows:

a1: constructing a segmentation model based on a deep neural network, and designing a corresponding loss function, so that pixel points with similar characteristics and adjacent spaces are divided into the same cluster as much as possible, and the cluster number of segmentation results is as much as possible;

a2: pre-training the segmentation model by taking the embedded image as a training sample to obtain a response matrix;

a3: taking the channel index where the maximum value of each pixel feature in the response matrix is as the label of the pixel to obtain a pre-segmentation result, wherein one label on the segmentation result corresponds to one cluster;

further, the parameter fine-tuning process of the segmentation model by the S5 is as follows:

a1: creating a blank scrawling image by taking the segmentation result as a template, and creating scrawling information on the corresponding position of the unreasonable cluster of the segmentation result by a user according to the priori knowledge of the user to obtain the scrawling image;

a2: converting the graffiti information into a regularization constraint term, and adding the regularization constraint term into a loss function of a pre-trained segmentation model in S4, so that pixels with the same graffiti mark are divided into the same cluster as much as possible, and pixels with different graffiti marks are divided into different clusters respectively;

a3: and carrying out parameter fine adjustment on the pre-training model by utilizing the embedded image to obtain an improved segmentation result.

Specific examples are given below.

FIG. 1 is a system block diagram of an embodiment of the invention. Wherein, X is a preprocessed high-dimensional MSI data set, E is a low-dimensional embedded image,

for scribble information, R is a multi-channel response matrix,

is the result of the segmentation.

The embodiment comprises the following steps:

1. sample preparation and data acquisition:

(1) the tissue sample used in the present invention is a mouse cultured in a certain institute. After mice were sacrificed, specimens were snap frozen in liquid nitrogen and placed in a refrigerator at-80 ℃. The specimens were sectioned using the instrument CryoStar Nx79(Thermo Fisher Scientific, Germany) to give frozen sections 10 μm thick.

(2) After thawing the specimens, they were placed on cold Indium Tin Oxide (ITO) coated conductive slides and dehydrated in a vacuum chamber for 30 min. After dehydration, N- (1-naphthyl) -ethylenediamine dihydrochloride (NEDC) is taken as a matrix and is uniformly sprayed on the surface of the slice. Data acquisition was performed in negative ion mode using Rapiflex MALDI tissue (Bruker Daltonics, Germany) and the mass-to-charge ratio (m/z) range of the ion peak to be acquired was set to 250-.

2. Preprocessing the data and deriving into a high-dimensional MSI data set X:

(1) the collected raw data is converted into data in an imzML format by using SCiLs Lab software carried by an MSI instrument.

(2) Reading the data file in the imzML format by using a public R packet 'MALDIquant', and performing pretreatment such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like. And exporting the preprocessed MSI data into csv files, wherein each file corresponds to one pixel in the original data and comprises the spectral peak position and the corresponding intensity acquired by the instrument. The specific processes of spectral peak alignment, spectral peak extraction and spectral peak merging are as follows:

alignment of spectral peaks: the ion spectral peak of each pixel in the MSI data is aligned with the reference spectral peak, correcting for random mass-to-charge ratio drift of the same ion spectral peak in different pixels. Wherein the reference spectral peak is selected as the spectral peak having the highest correlation with the other spectral peaks.

And (3) extraction of a spectrum peak: the ion spectral peak for each pixel in the MSI data is converted to a list containing only the mass-to-charge ratio and corresponding intensity at the location of the spectral peak. Among these, the spectral peaks identified must satisfy two conditions: one is that the peak needs to be a local maximum within a given window (size 20 ppm); secondly, the intensity of the spectrum peak needs to be more than twice of the noise peak. Wherein the noise peak intensity is determined by Median Absolute Deviation (MAD) estimation.

And (3) spectrum peak combination: the ion spectral peaks of each pixel in the MSI data over a range of mass-to-charge ratios are combined and the spectral peak intensity is set to the sum of the intensities of all ion spectral peaks in that range.

(3) And merging the exported csv files by utilizing a written Python script, counting the occurrence frequency of each ion peak in all pixels, and deleting the ion peaks with the occurrence frequency less than 5% to obtain a preprocessed MSI data set X.

3. And mapping the preprocessed high-dimensional MSI data set X into a low-dimensional space by using an unsupervised dimension reduction algorithm UMAP to obtain a low-dimensional embedded image E of the MSI data.

4. Unsupervised segmentation model using low-dimensional embedded image E

Pre-training is carried out to obtain a pre-segmentation result

(1) Construction of a segmentation model of a deep neural network comprising N layers of CNN units

Wherein, the first N-1 CNN modules comprise 1 2D CNN layer with convolution kernel of 3 × 3, 1 ReLU activation layer and 1 Batch Normalization (BN) layer, and the last CNN module comprises 1 2D CNN layer with convolution kernel of 1 × 1 and 1 BN layer.

(2) Training segmentation model with low-dimensional embedded image E

Obtaining a multi-channel response matrix R, then taking the index of the channel where the maximum value of the response matrix R is as the clustering label of each pixel, and obtaining a pre-segmentation result

(3) Loss function of segmentation model

The design is as follows:

wherein, ω is₁，ω₂，ω₃Are respectively

And

the weight of (a) is determined,

and

respectively as follows:

wherein a and b are respectively expressed as the abscissa and ordinate of the pixel in the MSI data, and i is expressed asChannel of the pixel, H ═ C_a，b，i)_A×BIs a one-hot encoding of the initial segmentation result. Loss function

Pixels with similar characteristics are divided into the same cluster as much as possible,

so that spatially adjacent pixels are divided into the same cluster as much as possible

The number of regions to be segmented should be as large as possible.

(4) Segmentation model

The pre-training of (1) adopts a random gradient descent (SGD) optimizer, calculates and propagates the parameters of the updated model in reverse at a learning rate of 0.01 and a momentum of 0.9.

5. Graffiti information created with user interaction

As regularization constraint term, to segmentation model

Fine tuning parameters to obtain an improved segmentation result:

(1) by pre-dividing the result

A blank graffiti image is created for the template. Pre-segmenting results based on a priori knowledge of a user

Creating graffiti on positions corresponding to unreasonable clusters to obtain graffiti images

Wherein a graffiti is a set of continuous or discrete pixels having a defined class.

(2) Will scribble the image

Converted into regularization terms and added to the segmentation model

Loss function of

And in the middle, performing parameter fine adjustment on the model to obtain a segmentation result after graffiti interaction.

(3) Regularization item of doodle information in training process

The design is as follows:

wherein G ═ G_ab，i)_A×B×qIs a graffiti image

The one-hot encoding of (1). The regularization constraint term will make pixels with the same graffiti marking as partitioned into the same cluster as possible.

Adding the regularization term to a loss function of the model

In (1), namely:

wherein, ω is₁，ω₂，ω₃，

And

the same as formula (1); omega₄As a regularization term

The weight of (c); if the pixel is contained in a graffiti pixel, then k is 1, otherwise k is 0. And (5) carrying out parameter fine adjustment on the segmentation model by using a new loss function formula (6).

(4) Segmentation model

The fine tuning of the model is carried out by adopting an SGD optimizer to carry out back propagation on the basis of the pre-trained parameters, and the parameters of the model are updated by calculating the learning rate of 0.001 and the momentum of 0.9 and carrying out back propagation.

According to an embodiment of the invention, MSI data of a mouse fetus are interactively segmented. Since mouse fetal samples contain multiple organs and the MSI data is very complex, spatial partitioning of MSI data for mouse fetuses is a difficult task. FIG. 2 shows the segmentation results of an unsupervised segmentation model without a doodling constraint. It can be seen that there are partial organs and sub-organs that are not accurately divided, such as: hippocampus, midbrain and brainstem, tongue and heart, etc., in the brain region. And the segmentation result of the model is gradually improved through a plurality of scrawling interactions, and the unreasonable segmentation areas are greatly improved. The experimental results demonstrate the effectiveness of the interactive segmentation method based on graffiti regularization provided by the present invention.

In summary, the present invention develops a novel strategy for spatial segmentation of MSI data, which allows a user to convert a priori knowledge into doodle information, regularize constraints on a model, and improve the rationality of the model segmentation results. The proposed method comprises two phases: 1) and pre-training an unsupervised segmentation model, and pre-training the model by using the MSI low-dimensional embedded image to obtain a primary segmentation result. 2) And defining the doodle information by the user according to the prior knowledge, and performing parameter fine adjustment on the segmentation model as a regularization term to obtain an improved segmentation result. The method and the device design an effective loss function, and ensure that the stability and the segmentation performance of the DNN model can be improved by introducing the doodle regularization. The results of the examples demonstrate that the method enables a fine segmentation of organs and sub-organs in complex MSI data of mouse fetuses, leading to segmentation results that are more relevant to the study problem. The interactive segmentation method developed by the invention can become a powerful tool for biomedical research and is expected to be widely applied to other hyperspectral imaging technologies.

Claims

1. A method of interactive spatial segmentation of mass spectrometry imaging data, comprising the steps of:

1) sample preparation and mass spectrometry imaging data acquisition: freezing and slicing a biological sample to be detected to obtain a histological slice, performing a matrix-assisted laser desorption ionization experiment on the obtained histological slice, and collecting MSI original data;

And using the embedded image E obtained in the step 3) to divide the model

Performing unsupervised training on the parameters to obtain a pre-segmentation result of the embedded image E

5) Model regularized fine tuning: according to the priori knowledge of the user, creating graffiti information in the area with unreasonable pre-segmentation result to obtain graffiti image

Will scribble information

Transforming into regularized constraint term, and segmenting into models

2. The method of claim 1, wherein in step 1), the sample preparation and the acquisition of the mass spectrometry imaging data comprise the following steps:

(2) and taking out the glass slide during the MSI analysis experiment, placing the glass slide in a vacuum chamber for 30min for unfreezing and dehydrating, uniformly spraying a matrix of N- (1-naphthyl) -ethylenediamine dihydrochloride on the surface of the slice after dehydration, and collecting data by using a MALDI-TOF mass spectrometer.

3. The method of claim 1, wherein in step 2), the data preprocessing comprises performing peak alignment, peak extraction and peak combination on the data to make the mass-to-charge ratio coordinates of the peaks of all pixels consistent; the small probability spectral peaks in the pixels are deleted to obtain a high dimensional MSI dataset X.

4. The method of claim 1, wherein in step 2), the data preprocessing comprises the following steps:

(2) preprocessing such as spectral peak alignment, spectral peak extraction, spectral peak combination and the like is carried out after the imzML format file is read, so that the spectral peak mass-to-charge ratio (m/z) coordinates of all pixels are kept consistent;

5. The method of claim 1, wherein in step 3), the dimensionality reduction algorithm employs an unsupervised dimensionality reduction method comprising homogeneous manifold approximation, projection.

6. The interactive spatial segmentation method for mass spectrometry imaging data of claim 1, wherein in step 4), the unsupervised segmentation specifically comprises the following steps:

(1) construction of segmentation model based on deep neural network

And designing a corresponding loss function

Pre-training to obtain a response matrix R;

One label on the segmentation result corresponds to one cluster.

7. The method of claim 1, wherein in step 5), the step of regularizing and fine-tuning the model comprises:

(1) by dividing the result

(2) Will scribble the image

And fine-tuning the parameters to obtain an improved segmentation result.