WO2023010660A1 - 一种生物材料功能预测评价方法 - Google Patents

一种生物材料功能预测评价方法 Download PDF

Info

Publication number
WO2023010660A1
WO2023010660A1 PCT/CN2021/119233 CN2021119233W WO2023010660A1 WO 2023010660 A1 WO2023010660 A1 WO 2023010660A1 CN 2021119233 W CN2021119233 W CN 2021119233W WO 2023010660 A1 WO2023010660 A1 WO 2023010660A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
evaluation
sample
tested
data
Prior art date
Application number
PCT/CN2021/119233
Other languages
English (en)
French (fr)
Inventor
邓旭亮
周莹莹
张学慧
平现凤
Original Assignee
北京大学口腔医学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学口腔医学院 filed Critical 北京大学口腔医学院
Publication of WO2023010660A1 publication Critical patent/WO2023010660A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the invention relates to an evaluation model of a biological material, in particular to a method for predicting and evaluating the function of a biological material.
  • the evaluation content of medical materials at home and abroad is mainly divided into two aspects: physical and chemical performance evaluation and biological evaluation.
  • the evaluation of biological performance focuses on biological toxicity and safety evaluation, but lacks a unified evaluation system for functional evaluation.
  • the evaluation of the stem cell fate regulation function of biomaterials has not yet been included in the national medical biomaterial effectiveness and safety evaluation standards. Therefore, the material evaluation data in this area are generated in various biomaterial research laboratories. Due to the lack of uniform standards for characterization methods and characterization techniques, there is heterogeneity in the sample database. Furthermore, most current functional evaluation experiments are limited to a single metric.
  • the identity of a cell is reflected in the expression of specific genes, so the current identification of cell types is often the identification of the expression of a single specific gene. For example, qPCR detection of genes highly expressed in osteoblasts such as BMP2, Runx2, and COL1 at the gene level, or Western Blot detection of osteocalcin OCN and bone-derived alkaline phosphatase ALP at the protein level.
  • the present invention aims at the technical problems of the existing evaluation methods such as labor-intensive, long experiment cycle, and large heterogeneity of the sample library, and provides a high-accuracy and predictable biological material function prediction and evaluation method.
  • the present invention provides a method for predicting and evaluating the function of biological materials, comprising the following steps: (1) culturing human-derived bone marrow mesenchymal stem cells in the environment of the material to be tested; (2) collecting the cells cultured in the step (1) Human-derived bone marrow mesenchymal stem cells, extracting total RNA, purifying and building a library, and sequencing the transcriptome to obtain the transcriptome data of the sample to be tested; (3) batching the transcriptome data of the sample to be tested obtained in the step (2) After secondary effect correction and feature extraction, input the function prediction evaluation model to calculate the confidence that the samples to be tested are different cell types.
  • the method for constructing the function prediction evaluation model in the step (3) comprises the following steps: (a) dividing the transcriptome data of the sample to be tested obtained in the step (2) into a training set and a test set, respectively Perform batch effect correction; (b) extract the gene expression characteristics of four types of cell types based on the training set data, and perform feature extraction on the transcriptome data; (c) train the machine learning model based on the training set data, and optimize Ensemble Learning intelligent prediction Model; (d) Input the test set data into the Ensemble Learning intelligent prediction model to obtain the predicted cell type of the test set sample, compare it with the real cell type of the sample, and calculate the accuracy and recall rate indicators of the model.
  • the batch effect correction is based on the integrated optimization of the ComBatseq algorithm and the DaMiRseq algorithm; the known sample type and batch of the training set; the unknown sample type of the test set, the batch of the test set Effect correction is based on parameters produced by batch effect correction on the training set, and each test set is corrected independently.
  • the feature extraction is based on the integrated extraction of the DaMiRseq algorithm and the DESeq2 algorithm; after the batch effect correction is performed on the training set, the characteristic expression genes of the four types of cell types are extracted according to the sample type; The expression matrix of characteristic genes was extracted from the training set and test set data after batch effect correction.
  • the Ensemble Learning intelligent prediction model is constructed; first train and optimize the model on the training set , and then compute the model’s evaluation metrics on the test set.
  • the present invention designs and constructs a biomaterial function prediction and evaluation method based on the transcriptome as the basis for quantitative evaluation, and compares the transcriptome of the cells to be tested with the gene expression profiles of different cell types of stem cell differentiation constructed in advance to obtain biomaterial-induced cell The full picture of the differentiation state.
  • the present invention integrates four machine learning algorithms of Ridge Classifier CV, Support Vector Machine, Decision Tree and Gaussian Naive Bayes, and trains four types of cells that can distinguish osteoblasts, chondrocytes, adipocytes, and undifferentiated mesenchymal stem cells.
  • the intelligent prediction model of cell type samples has significantly improved the accuracy of the four cell types; at the same time, the present invention will be derived from the public database, after chemical induction and biological material cultivation before and after human
  • the RNAseq data of bone marrow mesenchymal stem cells was used as a test sample and input into a prediction model based on the gene expression profile database of reference samples. The results showed that the cell type predicted by the intelligent model was consistent with the phenotype of the test sample.
  • Fig. 1 is the hierarchical clustering diagram of the RNAseq data sourced from the public database in the present invention, we remove the abnormal samples above the horizontal line through the correlation coefficient between the samples, and the retained samples are used for the construction of the reference sample gene expression profile database;
  • Fig. 2 (a), Fig. 2 (b), Fig. 2 (c), Fig. 2 (d) are before and after batch effect correction in the present invention, the variable variance explanation percentage quantitative histogram and gene expression of reference sample gene expression profile database Box plot;
  • Figure 2(a) shows that before batch effect correction, the percentage of variance explained by batches in the reference database is significantly higher than that of cell types, indicating that the differences between samples are mainly due to batch effects
  • Figure 2( b) shows that before the correction of the batch effect, the gene expression distribution of the samples in the reference database is inconsistent among batches, and there is an obvious batch effect
  • the percentage of variance of was significantly higher than the batch effect
  • Figure 2(d) shows that after batch effect correction, the gene expression distribution of samples in the reference database tends to be consistent among batches, and the batch effect is significantly corrected;
  • Fig. 3(a) and Fig. 3(b) are before and after data preprocessing in the present invention, the visualization diagram of the sample in the reference database through tSNE dimensionality reduction; wherein, Fig. 3(a) shows before data preprocessing, after dimensionality reduction The samples are clustered according to the batch; Figure 3(b) shows that after the two-step preprocessing of batch effect correction and feature extraction, the samples are clustered according to the cell type after dimensionality reduction, and the samples of the same cell type will be visualized in big data. cluster together;
  • Fig. 4 is a gene expression heat map of four types of cell types samples of osteoblasts, chondrocytes, adipocytes, and undifferentiated mesenchymal stem cells after feature extraction in the present invention, which is shown after extracting the gene expression profiles of characteristic genes , there are obvious differences in the four cell types of osteoblasts, chondrocytes, adipocytes, and undifferentiated mesenchymal stem cells.
  • the ordinate is the gene name, and the abscissa is the sample;
  • Fig. 5 (a), Fig. 5 (b) are the receiver operating characteristic curves of the accuracy rate of the prediction sample cell type and the optimized intelligent prediction model of the comparison classical machine learning model among the present invention;
  • Fig. 5 (a) shows , 100 cycles of cross-validation on the training set, the Ensemble Learning intelligent prediction model constructed by random forest model, support vector machine model, Gaussian distribution model, linear discriminant analysis model and the combination of four models can accurately predict the four types of cell type samples The rates are all higher than 90%;
  • Figure 5(b) shows the receiver operating characteristic curve (ROC curve) of the optimized Ensemble Learning intelligent prediction model, the ordinate is the true positive rate, the abscissa is the false positive rate, and the average test The operator operating characteristic curve is close to the upper left corner, and the area under the curve (AUC value) is close to 1, indicating that the prediction model has excellent classification effect;
  • Fig. 6 is the classification effect evaluation report of the optimized intelligent prediction model in the present invention.
  • the RNAseq data of human bone marrow mesenchymal stem cells before and after three chemical induction treatments of osteogenesis, chondrogenicity and adipogenicity from the public database are used as test samples.
  • Input the intelligent prediction model and calculate the predicted cell type of each sample, so as to evaluate the classification effect of the intelligent prediction model. It can be seen that the four types of test samples can obtain high F1 scores, indicating the comprehensive precision rate and recall rate.
  • Two indicators, the intelligent prediction model has a good classification effect on the four types of cell types: osteoblasts, chondrocytes, adipocytes, and undifferentiated mesenchymal stem cells;
  • Fig. 7 is a flow chart of the construction method of the function prediction evaluation model in the present invention.
  • the invention provides a method for predicting and evaluating the function of biological materials, which comprises the following steps: (1) cultivating human-derived bone marrow mesenchymal stem cells in the environment of the material to be tested; (2) collecting the human-derived bone marrow mesenchymal stem cells cultured in the step (1) Bone marrow mesenchymal stem cells, extracting total RNA, purifying and building a library, and sequencing the transcriptome; (3) After batch effect correction and feature extraction, the transcriptome data of the sample to be tested (that is, the data of the sample obtained in step (2)), Input the function prediction evaluation model of the present invention (the function prediction evaluation model is the Ensemble Learning intelligent prediction model constructed by integrating Ridge Classifier CV, Support Vector Machine, Decision Tree and Gaussian Naive Bayes four machine learning algorithms), calculate the The samples are the confidence of the four cell types of osteoblasts, chondrocytes, adipocytes, and undifferentiated mesenchymal stem cells.
  • the construction of the function prediction and evaluation model in the present invention includes the following steps: first, the transcriptome data is divided into a training set and a test set, and batch effect correction is performed respectively; then, four types of cells are extracted based on the training set data type of gene expression features, and feature extraction of transcriptome data; after that, train the machine learning model based on the training set data, and optimize the Ensemble Learning intelligent prediction model; finally, input the test set data into the Ensemble Learning intelligent prediction model to obtain the test set
  • the predicted cell type of the sample is compared with the real cell type of the sample, and the accuracy rate, recall rate and other indicators of the model are calculated.
  • the sample type and batch of the training set are known, and the function parameters selected for batch effect correction are shown in Figure 7; the sample type of the test set is unknown, and the batch effect correction of the test set is based on the parameters generated by the batch effect correction of the training set.
  • Each test set is calibrated independently, and the selected function parameters are shown in Figure 7.
  • the characteristic expression genes of the four types of cell types were extracted according to the sample type, and the selected function parameters were shown in Figure 7; then, the training set and test set data after the batch effect correction were processed The expression matrix of the characteristic genes was extracted separately.
  • Functional prediction and evaluation model By integrating four machine learning algorithms of Ridge Classifier CV, Support Vector Machine, Decision Tree and Gaussian Naive Bayes, an intelligent prediction model of Ensemble Learning is constructed. First train and optimize the model on the training set, and then calculate the evaluation index of the model on the test set.
  • the optimized Ensemble Learning intelligent prediction model was used to train an intelligent prediction model that can distinguish four types of cell types: osteoblasts, chondrocytes, adipocytes, and undifferentiated mesenchymal stem cells.
  • the operating characteristic curve of the test subjects shows that the Ensemble Learning intelligent prediction model based on big data and machine learning has excellent classification effect on the four cell types.
  • RNAseq data of human bone marrow mesenchymal stem cells before and after three chemical induction treatments of osteogenesis, chondrogenicity and adipogenicity from the public database were used as test samples, input into the intelligent prediction model, and after calculation, each sample
  • the four types of test samples can obtain higher F1 scores, and the precision rate and recall rate of the osteoblast cell type are higher than High, indicating that the Ensemble Learning intelligent prediction model has a reliable predictive effect on whether the samples cultured in the biomaterial environment are osteogenic.

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Microbiology (AREA)

Abstract

本发明涉及一种生物材料功能预测评价方法,其解决了现有评价方法劳动密集、实验周期长、样本库异质性大的技术问题,其包括如下步骤:(1)在待测材料环境中,培养人源骨髓间充质干细胞;(2)收集所述步骤(1)培养的人源骨髓间充质干细胞,提取总RNA,纯化建库,转录组测序,得到待测样本的转录组数据;(3)将所述步骤(2)得到的待测样本的转录组数据经批次效应校正、特征提取后,输入本发明的功能预测评价模型,计算出待测样本分别为不同细胞类型的置信度。本发明可用于生物材料功能预测评价领域。

Description

一种生物材料功能预测评价方法 技术领域
本发明涉及一种生物材料的评价模型,具体地说,其涉及一种生物材料功能预测评价方法。
背景技术
当前,国内外对医用材料的评价内容主要分为理化性能评价和生物学评价两方面。其中,生物性能的评价集中在生物毒性,安全性评估方面,而在功能性评估上缺乏统一的评价体系。例如,对生物材料的干细胞命运调控功能评估尚未纳入国家医用生物材料有效性和安全性评价标准。因此这方面的材料评估数据产生于各生物材料研究实验室,由于表征手段,表征技术等缺乏统一的标准,样本数据库存在异质性。此外,当前大多数功能评估实验局限于单一的指标。细胞的身份体现在特异基因的表达上,因此当前对细胞类型的鉴定往往是对单个特异性基因表达的鉴定。例如,在基因层面上对在成骨细胞中高表达的基因BMP2,Runx2,COL1等进行qPCR检测,或者在蛋白质层面上对骨钙蛋白OCN,骨源性碱性磷酸酶ALP进行Western Blot检测。
然而,使用传统单一指标评价方法具有很大局限性,主要体现在以下几个方面:(1)单基因的qPCR检测不足以准确判断细胞的身份,因为同一种基因可能在多种细胞类型中高表达,另外,即使只有一部分细胞高表达该基因仍可能导致qPCR检测为整体高表达。(2)为提高准确性,往往需要对多个基因进行qPCR检测,造成劳力的浪费。(3)不同材料的评估之间难以比较:基于不同指标的评价无法直接比较,即使相同的指标也因缺乏标准定量化而难以比较。(4)无法提供细胞分化状态的全貌,既不能给出分化细胞的比例,也无法知晓细胞是否已经朝骨细胞的方向分化。
综上所述,单个生物标志分子的表达对细胞分化方向的评估效果不可定量,缺乏对细胞分化全貌的可量化评估,使得新型生物材料功能性 上的设计优化研究缺少理论和数据支持,难以高通量筛选优化材料体系的理化参数,新型生物材料的生物性能也缺乏可预测性。
发明内容
本发明就是针对现有评价方法劳动密集、实验周期长、样本库异质性大等技术问题,提供一种准确率高、可预测的生物材料功能预测评价方法。
为此,本发明提供一种生物材料功能预测评价方法,包括如下步骤:(1)在待测材料环境中,培养人源骨髓间充质干细胞;(2)收集所述步骤(1)培养的人源骨髓间充质干细胞,提取总RNA,纯化建库,转录组测序,得到待测样本的转录组数据;(3)将所述步骤(2)得到的待测样本的转录组数据经批次效应校正、特征提取后,输入功能预测评价模型,计算出待测样本分别为不同细胞类型的置信度。
优选的,所述步骤(3)中的功能预测评价模型的构建方法包括如下步骤:(a)将所述步骤(2)得到的待测样本的转录组数据分为训练集和测试集,分别进行批次效应校正;(b)基于训练集数据提取四类细胞类型的基因表达特征,并对转录组数据进行特征提取;(c)基于训练集数据训练机器学习模型,优化得到Ensemble Learning智能预测模型;(d)将测试集数据输入Ensemble Learning智能预测模型,得到测试集样本的预测细胞类型,与样本的真实细胞类型比较,计算模型的准确率、查全率指标。
优选的,所述步骤(a)中,所述批次效应校正,基于ComBatseq算法和DaMiRseq算法整合优化;训练集已知样本类型和批次;测试集的样本类型未知,对测试集的批次效应校正基于训练集批次效应校正产生的参数,每个测试集独立校正。
优选的,所述步骤(b)中,所述特征提取,基于DaMiRseq算法和DESeq2算法整合提取;对训练集进行批次效应校正后,根据样本类型提取四类细胞类型的特征表达基因;对经过批次效应校正处理后的训练集和测试集数据分别提取特征基因的表达矩阵。
优选的,所述步骤(c)中,通过整合Ridge Classifier CV、Support Vector  Machine、Decision Tree和Gaussian Naive Bayes四种机器学习算法,构建得到Ensemble Learning智能预测模型;首先在训练集上训练和优化模型,然后在测试集上计算模型的评价指标。
本发明具有以下有益效果:
本发明设计和构建以转录组为定量评价依据的生物材料功能预测评价方法,将待测细胞转录组与事先构建好的干细胞分化的不同细胞类型的基因表达谱进行比较,以获得生物材料诱导细胞分化状态的全貌。
具体地说,本发明整合Ridge Classifier CV、Support Vector Machine、Decision Tree和Gaussian Naive Bayes四种机器学习算法,训练出能区分成骨细胞、成软骨细胞、脂细胞、未分化间充质干细胞四类细胞类型样本的智能预测模型,相对于传统生物标志物评价方法,对四种细胞类型的判断准确率有明显提升;同时,本发明将来源于公共数据库的,经化学诱导和生物材料培养前后人骨髓间充质干细胞的RNAseq数据作为测试样本,输入基于参考样本基因表达谱数据库构建的预测模型,得到的结果显示,智能模型预测出的细胞类型与测试样本的表型相符。
附图说明
图1为本发明中公共数据库来源的RNAseq数据的层级聚类图,通过样本之间的相关系数我们剔除横线以上的异常样本,保留下来的样本用于参考样本基因表达谱数据库的构建;
图2(a)、图2(b)、图2(c)、图2(d)为本发明中批次效应校正前后,参考样本基因表达谱数据库的变量方差解释百分比定量柱状图及基因表达箱型图;其中,图2(a)显示批次效应校正前,参考数据库中批次所解释的方差百分比明显高于细胞类型,说明样本之间的差异主要源于批次效应;图2(b)显示批次效应校正前,参考数据库中样本的基因表达分布在各批次间不一致,存在明显的批次效应;图2(c)显示批次效应校正后,参考数据库中细胞类型所解释的方差百分比明显升高并高于批次效应;图2(d)显示显示批次效应校正后,参考数据库中样本的基因表达分布在各批次间趋于一致,批次效应得到明显校正;
图3(a)、图3(b)为本发明中在数据预处理前后,参考数据库中样本通过tSNE降维的可视化图;其中,图3(a)显示在数据预处理前,降维后样本按照批次聚类;图3(b)显示在经过批次效应校正和特征提取两步预处理后,降维后样本按照细胞类型聚类,同一种细胞类型的样本在大数据中可视化会聚类在一起;
图4为本发明中在经过特征提取后,成骨细胞、成软骨细胞、脂细胞、未分化间充质干细胞四类细胞类型样本的基因表达热图,显示在提取特征基因的基因表达图谱后,成骨细胞、成软骨细胞、脂细胞、未分化间充质干细胞四类细胞类型有明显的区别,纵坐标是基因名,横坐标是样本;
图5(a)、图5(b)为本发明中比较经典的机器学***均受试者工作特征曲线靠近左上角,曲线下面积(AUC值)接近1,表明该预测模型具有优良的分类效果;
图6为本发明中优化后智能预测模型的分类效果评价报告,将来源于公共数据库的成骨、成软骨、成脂三种化学诱导处理前后人骨髓间充质干细胞的RNAseq数据作为测试样本,输入智能预测模型,计算后得到每个样本的预测细胞类型,从而对智能预测模型的分类效果进行评价,可见四类测试样本均能获得较高的F1分数,说明综合查准率和查全率两个指标,智能预测模型对成骨细胞、成软骨细胞、脂细胞、未分化间充质干细胞四类细胞类型样本的分类效果良好;
图7为本发明中功能预测评价模型的构建方法流程图。
具体实施方式
下面结合实施例对本发明做进一步描述。
本发明提供一种生物材料功能预测评价方法,其包括如下步骤:(1)在待测材料环境中,培养人源骨髓间充质干细胞;(2)收集所述步骤(1)培养的人源骨髓间充质干细胞,提取总RNA,纯化建库,转录组测序;(3)将待测样本的转录组数据(即步骤(2)得到样本的数据)经批次效应校正、特征提取后,输入本发明的功能预测评价模型(功能预测评价模型是通过整合Ridge Classifier CV、Support Vector Machine、Decision Tree和Gaussian Naive Bayes四种机器学习算法,构建得到的Ensemble Learning智能预测模型),计算出待测样本分别为成骨细胞、成软骨细胞、脂细胞、未分化间充质干细胞四类细胞类型的置信度。
如图7所示,本发明中功能预测评价模型的构建包括如下步骤:首先,转录组数据被分为训练集和测试集,分别进行批次效应校正;然后,基于训练集数据提取四类细胞类型的基因表达特征,并对转录组数据进行特征提取;之后,基于训练集数据训练机器学习模型,优化得到Ensemble Learning智能预测模型;最后,将测试集数据输入Ensemble Learning智能预测模型,得到测试集样本的预测细胞类型,与样本的真实细胞类型比较,计算模型的准确率、查全率等指标。
一、批次效应校正:基于ComBatseq算法和DaMiRseq算法整合优化。
训练集已知样本类型和批次,批次效应校正选用的函数参数如示意图7所示;测试集的样本类型未知,对测试集的批次效应校正基于训练集批次效应校正产生的参数,每个测试集独立校正,选用的函数参数如示意图7所示。
二、特征提取:基于DaMiRseq算法和DESeq2算法整合提取。
对训练集进行批次效应校正后,根据样本类型提取四类细胞类型的特征表达基因,选用的函数参数如示意图7所示;然后,对经过批次效应校正处理后的训练集和测试集数据分别提取特征基因的表达矩阵。
三、功能预测评价模型:通过整合Ridge Classifier CV、Support Vector Machine、Decision Tree和Gaussian Naive Bayes四种机器学习算法,构建 得到Ensemble Learning智能预测模型。首先在训练集上训练和优化模型,然后在测试集上计算模型的评价指标。
如图3(a)、图3(b)、图4所示,本发明经批次效应校正和特征提取两步数据预处理后,参考数据库中成骨细胞、成软骨细胞、脂细胞、未分化间充质干细胞四类细胞类型的样本在基因表达图谱上存在明显类间差异。
如图5(b)所示,用优化后的Ensemble Learning智能预测模型训练出能区分成骨细胞、成软骨细胞、脂细胞、未分化间充质干细胞四类细胞类型样本的智能预测模型,受试者工作特征曲线显示,基于大数据和机器学习的Ensemble Learning智能预测模型对四种细胞类型具有优良的分类效果。
如图6所示,将来源于公共数据库的成骨、成软骨、成脂三种化学诱导处理前后人骨髓间充质干细胞的RNAseq数据作为测试样本,输入智能预测模型,计算后得到每个样本的预测细胞类型,从而对Ensemble Learning智能预测模型的分类效果进行评价,可见四类测试样本均能获得较高的F1分数,其中成骨细胞一类细胞类型的查准率和查全率均较高,说明Ensemble Learning智能预测模型对于生物材料环境培养的样本是否成骨具有可靠的预测效果。
惟以上所述者,仅为本发明的具体实施例而已,当不能以此限定本发明实施的范围,故其等同组件的置换,或依本发明专利保护范围所作的等同变化与修改,皆应仍属本发明权利要求书涵盖之范畴。

Claims (5)

  1. 一种生物材料功能预测评价方法,其特征是,包括如下步骤:
    (1)在待测材料环境中,培养人源骨髓间充质干细胞;
    (2)收集所述步骤(1)培养的人源骨髓间充质干细胞,提取总RNA,纯化建库,转录组测序,得到待测样本的转录组数据;
    (3)将所述步骤(2)得到的待测样本的转录组数据经批次效应校正、特征提取后,输入功能预测评价模型,计算出待测样本分别为不同细胞类型的置信度。
  2. 根据权利要求1所述的生物材料功能预测评价方法,其特征在于,所述步骤(3)中的功能预测评价模型的构建方法包括如下步骤:
    (a)将所述步骤(2)得到的待测样本的转录组数据分为训练集和测试集,分别进行批次效应校正;
    (b)基于训练集数据提取四类细胞类型的基因表达特征,并对转录组数据进行特征提取;
    (c)基于训练集数据训练机器学习模型,优化得到Ensemble Learning智能预测模型;
    (d)将测试集数据输入Ensemble Learning智能预测模型,得到测试集样本的预测细胞类型,与样本的真实细胞类型比较,计算模型的准确率、查全率指标。
  3. 根据权利要求2所述的的生物材料功能预测评价方法,其特征在于,所述步骤(a)中,所述批次效应校正,基于ComBatseq算法和DaMiRseq算法整合优化;训练集已知样本类型和批次;测试集的样本类型未知,对测试集的批次效应校正基于训练集批次效应校正产生的参数,每个测试集独立校正。
  4. 根据权利要求2所述的的生物材料功能预测评价方法,其特征在于,所述步骤(b)中,所述特征提取,基于DaMiRseq算法和DESeq2算法整合提取;对训练集进行批次效应校正后,根据样本类型提取四类细胞类型的特征表达基因;对经过批次效应校正处理后的训练集和测试集数据分别提取特征基因的表达矩阵。
  5. 根据权利要求2所述的的生物材料功能预测评价方法,其特征在 于,所述步骤(c)中,通过整合Ridge Classifier CV、Support Vector Machine、Decision Tree和Gaussian Naive Bayes四种机器学习算法,构建得到Ensemble Learning智能预测模型;首先在训练集上训练和优化模型,然后在测试集上计算模型的评价指标。
PCT/CN2021/119233 2021-08-03 2021-09-18 一种生物材料功能预测评价方法 WO2023010660A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110884816.5 2021-08-03
CN202110884816.5A CN113604544B (zh) 2021-08-03 2021-08-03 一种生物材料功能预测评价方法

Publications (1)

Publication Number Publication Date
WO2023010660A1 true WO2023010660A1 (zh) 2023-02-09

Family

ID=78339171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/119233 WO2023010660A1 (zh) 2021-08-03 2021-09-18 一种生物材料功能预测评价方法

Country Status (2)

Country Link
CN (1) CN113604544B (zh)
WO (1) WO2023010660A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011049439A1 (en) * 2009-10-19 2011-04-28 Universiteit Twente Method for selecting bone forming mesenchymal stem cells
CN105112493A (zh) * 2015-09-21 2015-12-02 中国人民解放军第四军医大学 一种骨植入材料表面体外细胞形态与成骨功能的检测与评价方法
US20160186146A1 (en) * 2014-12-31 2016-06-30 Wisconsin Alumni Research Foundation Human pluripotent stem cell-based models for predictive developmental neural toxicity
WO2016161311A1 (en) * 2015-04-02 2016-10-06 The New York Stem Cell Foundation In vitro methods for assessing tissue compatibility of a material
WO2019066421A2 (ko) * 2017-09-27 2019-04-04 이화여자대학교 산학협력단 Dna 복제수 변이 기반의 암 종 예측 방법
WO2021108556A1 (en) * 2019-11-26 2021-06-03 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Methods of identifying cell-type-specific gene expression levels by deconvolving bulk gene expression

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331642B (zh) * 2014-10-28 2017-04-12 山东大学 用于识别细胞外基质蛋白的集成学习方法
KR101765999B1 (ko) * 2015-01-21 2017-08-08 서울대학교산학협력단 암 바이오마커의 성능 평가 장치 및 방법
CN105567829A (zh) * 2016-01-26 2016-05-11 大连理工大学 用人骨髓间充质干细胞预测遗传毒性的方法
BR112018075407A2 (pt) * 2016-06-07 2019-03-19 Illumina, Inc. plataforma de análise genômica para executar uma segmentação de análise de sequência
CN108182346B (zh) * 2016-12-08 2021-07-30 杭州康万达医药科技有限公司 预测siRNA针对某类细胞的毒性的机器学习模型的建立方法及其应用
CN107045637B (zh) * 2016-12-16 2020-07-24 中国医学科学院生物医学工程研究所 一种基于光谱的血液物种识别仪及识别方法
EP3766000A2 (en) * 2018-03-16 2021-01-20 The United States of America as represented by the Secretary of the Department of Health and Human Services Using machine learning and/or neural networks to validate stem cells and their derivatives for use in cell therapy, drug discovery, and diagnostics
TW202002999A (zh) * 2018-04-06 2020-01-16 新加坡商細胞研究私人有限公司 臍帶羊膜的基本上純的間質幹細胞群用於產生攜帶轉殖基因的哺乳動物幹細胞的用途
SG11202009696WA (en) * 2018-04-13 2020-10-29 Freenome Holdings Inc Machine learning implementation for multi-analyte assay of biological samples
CN109360198A (zh) * 2018-10-08 2019-02-19 北京羽医甘蓝信息技术有限公司 基于深度学习的骨髓细胞分类方法及分类装置
DE102018125324A1 (de) * 2018-10-12 2020-04-16 Universität Rostock Verfahren zur Vorhersage einer Antwort auf die Therapie von Krankheiten
CN109918708B (zh) * 2019-01-21 2022-07-26 昆明理工大学 一种基于异质集成学习的材料性能预测模型构建方法
CN110400601A (zh) * 2019-08-23 2019-11-01 元码基因科技(无锡)有限公司 基于rna靶向测序和机器学习的癌症亚型分型方法及装置
WO2021113749A1 (en) * 2019-12-04 2021-06-10 Tempus Labs, Inc. Systems and methods for automating rna expression calls in a cancer prediction pipeline
CN112159791A (zh) * 2020-10-21 2021-01-01 北京大学口腔医学院 一种促进间充质干细胞定向成骨分化的方法
CN112382352B (zh) * 2020-10-30 2022-12-16 华南理工大学 基于机器学习的金属有机骨架材料结构特征快速评估方法
CN112858434B (zh) * 2021-01-11 2022-08-16 北京大学口腔医学院 半胱氨酸蛋白酶抑制剂b检测装置及其制备方法和应用

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011049439A1 (en) * 2009-10-19 2011-04-28 Universiteit Twente Method for selecting bone forming mesenchymal stem cells
US20160186146A1 (en) * 2014-12-31 2016-06-30 Wisconsin Alumni Research Foundation Human pluripotent stem cell-based models for predictive developmental neural toxicity
WO2016161311A1 (en) * 2015-04-02 2016-10-06 The New York Stem Cell Foundation In vitro methods for assessing tissue compatibility of a material
CN105112493A (zh) * 2015-09-21 2015-12-02 中国人民解放军第四军医大学 一种骨植入材料表面体外细胞形态与成骨功能的检测与评价方法
WO2019066421A2 (ko) * 2017-09-27 2019-04-04 이화여자대학교 산학협력단 Dna 복제수 변이 기반의 암 종 예측 방법
WO2021108556A1 (en) * 2019-11-26 2021-06-03 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Methods of identifying cell-type-specific gene expression levels by deconvolving bulk gene expression

Also Published As

Publication number Publication date
CN113604544A (zh) 2021-11-05
CN113604544B (zh) 2023-03-10

Similar Documents

Publication Publication Date Title
CN108319984B (zh) 基于dna甲基化水平的木本植物叶片表型特征和光合特性预测模型的构建方法及预测方法
CN110598902A (zh) 一种基于支持向量机与knn相结合的水质预测方法
CN104899425A (zh) 一种高炉铁水硅含量的变量选择预报方法
CN107238638A (zh) 基于大曲各成分理化指标与产酒量和酒质联系的测定方法
CN111144440A (zh) 一种专变用户日电力负荷特征的分析方法及装置
CN116072302A (zh) 基于有偏随机森林模型的医疗不平衡数据分类方法
CN114038501B (zh) 一种基于机器学习的背景菌判定方法
WO2023010660A1 (zh) 一种生物材料功能预测评价方法
CN109886314B (zh) 一种基于pnn神经网络的餐厨废弃油检测方法及其装置
CN113159220B (zh) 基于随机森林的混凝土侵彻深度经验算法评价方法和装置
CN111128300B (zh) 基于突变信息的蛋白相互作用影响判断方法
CN113584175A (zh) 一组评估肾***状细胞癌进展风险的分子标记及其筛选方法和应用
US20230066188A1 (en) Biomarker identifying method and cell producing method
CN109215736B (zh) 一种肠道病毒组的高通量检测方法及应用
CN117457065A (zh) 一种基于单细胞多组学数据识别表型相关细胞类型的方法和***
CN111707728A (zh) 基于hs-ptr-tof-ms的不同等级白牡丹茶鉴别方法
CN116721698A (zh) 染色体核型的预测***、构建方法、装置、设备及存储介质
WO2023134390A1 (en) Method for evaluating the quality of stem cells
CN114822827B (zh) 一种慢性阻塞性肺疾病急性加重预测***和预测方法
KR102440452B1 (ko) 핵산서열 분석 기반 유전자 변이 해석 방법
CN110517724B (zh) 利用单细胞转录和基因敲除数据推断基因调控网络的方法
CN105095689A (zh) 一种基于韦恩预测的电子鼻数据挖掘方法
CN113528631B (zh) 一种ngs测序中样本质量预测方法及***
WO2023134391A1 (en) System for evaluating quality of stem cells
CN113077841B (zh) 一种预测调控酵母自噬的功能基因的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952529

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE