CN111850124A - Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method - Google Patents

Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method Download PDF

Info

Publication number
CN111850124A
CN111850124A CN202010769768.0A CN202010769768A CN111850124A CN 111850124 A CN111850124 A CN 111850124A CN 202010769768 A CN202010769768 A CN 202010769768A CN 111850124 A CN111850124 A CN 111850124A
Authority
CN
China
Prior art keywords
lincrna
prediction
expression
sample
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010769768.0A
Other languages
Chinese (zh)
Inventor
高跃东
李文兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming Institute of Zoology of CAS
Original Assignee
Kunming Institute of Zoology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming Institute of Zoology of CAS filed Critical Kunming Institute of Zoology of CAS
Priority to CN202010769768.0A priority Critical patent/CN111850124A/en
Publication of CN111850124A publication Critical patent/CN111850124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Genetics & Genomics (AREA)
  • Public Health (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Pathology (AREA)
  • Wood Science & Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • General Engineering & Computer Science (AREA)
  • Hospice & Palliative Care (AREA)
  • Microbiology (AREA)
  • Physiology (AREA)
  • Oncology (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)

Abstract

The invention discloses a characteristic lincRNA expression profile combination and an early lung squamous carcinoma prediction method, wherein the nucleotide sequence of the characteristic lincRNA expression profile combination is shown in SEQ ID NO. 1-21. The method comprises the following steps: obtaining characteristic lincRNA stably and differentially expressed by patients with early lung squamous carcinoma; selecting characteristic lincRNA expression data, and carrying out data standardization on each sample; constructing an early prediction model for the standardized data by using a support vector machine; early prediction based on the patient's characteristic lincRNA expression level; the area under the ROC curve AUC of the characteristic lincRNA expression profile combination of the present invention was 0.994. The early stage lung squamous carcinoma morbidity is calculated and given by a support vector machine model only by acquiring the relative expression quantity of the 21 lincRNAs, and the early stage lung squamous carcinoma morbidity can be used as a reference basis for early stage lung squamous carcinoma prediction.

Description

Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method
Technical Field
The invention belongs to the technical field of biotechnology and medicine, and particularly relates to a characteristic lincRNA expression profile combination and an early lung squamous cell carcinoma prediction method.
Background
Squamous cell carcinoma of the lung (lung squamous cell carcinoma), accounts for 40% -51% of primary lung cancer. Squamous cell lung cancer is commonly seen in old men and has close relation with smoking. Squamous cell lung cancer is common in central lung cancer, and tends to grow in the chest cavity, and early squamous cell lung cancer often causes bronchoconstriction or obstructive pulmonary inflammation. Global Disease burden (GBD) data shows that the number of people with trachea, bronchi or lung cancer in 2017 is over 330 ten thousand globally, wherein the number of people with lung cancer in china is as high as 127 ten thousand. The number of deaths with the above cancers worldwide in 2016 was 188 ten thousand, accounting for 3.37% of the total deaths. The number of deaths in 2016 in China is 69 thousands, accounting for 6.62% of the total deaths. Statistics show a continuous increase in the prevalence and mortality of tracheal, bronchial and lung cancer worldwide from 1990 to 2017. The prevalence and mortality rates in china have increased year by year over the last decade and are growing at a rate higher than the global average.
A Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning manner, and a decision boundary of the SVM is a maximum edge distance hyperplane for solving a learning sample. The SVM model represents instances as points in space, so that the mapping is such that instances of the individual classes are separated by as wide an apparent interval as possible. The new instances are then mapped to the same space and the categories are predicted based on which side of the interval they fall on. When the training data is linearly separable, the SVM is classified by hard interval maximization learning. When the training data is linearly non-separable, the SVM is classified by using a kernel technique and soft interval maximization learning. SVMs are powerful for medium-sized data sets with similar meaning of features and are also suitable for small data sets. In general, the prediction effect is good for the SVM data set with the sample size less than 1 ten thousand. SVM has a wide range of applications in disease diagnosis, tumor classification, tumor gene recognition, and the like.
Early diagnosis of tumors has been a difficult problem in the medical community. The existing early diagnosis methods mostly observe the expression level of a certain marker or a class of markers, and the ideal diagnosis effect is difficult to achieve. Since the expression profiles of these markers in tumor patients and normal populations partially overlap, it is difficult to define a cut-off for the markers that better separates tumor patients from normal populations. Therefore, the use of multiple marker expression signature combinations may be an effective method for early diagnosis of tumors. Long-stranded intergenic non-coding RNA (lincRNA) is a type of non-coding single-stranded RNA molecule with a length greater than 200 nucleotides located in the intergenic non-coding sequence. lincRNA has no coding potential and is not conserved between different species. Research shows that lincRNA is involved in the expression regulation of multiple genes, and the lincRNA is relatively stable in expression in a human body and easy to detect. Since the expression distribution of individual lincRNA molecules in tumor and normal human populations overlap, it is difficult to define a critical value for early diagnosis.
Therefore, there is a need to establish a more stable predictive model of the combination of multiple differential lincRNA expression profiles that will aid in the early prediction of squamous cell lung carcinoma.
Disclosure of Invention
In view of the above, the invention provides a characteristic lincRNA expression profile combination and an early lung squamous cell carcinoma prediction method, which can accurately predict the stage I/II of lung squamous cell carcinoma.
In order to solve the above technical problems, the present invention discloses a combination of characteristic lincRNA expression profiles, comprising: AC026401.3, AC125807.2, AC244090.1, AL137003.2, AL355338.1, AL359643.3, AL365203.2, AP002360.1, AP003486.1, BAIAP2-DT, HEIH, LINC01503, MIAT, MIR210HG, MIR22HG, NUP50-DT, PCAT19, PSMB8-AS1, PSMG3-AS1, PVT1 and SMIM25, and the nucleotide sequences thereof are shown in SEQ ID NO.1-SEQ ID NO. 21.
The invention also discloses a lung squamous carcinoma early-stage prediction method based on the combination of characteristic lincRNA expression profiles, which comprises the following steps:
step 1, obtaining characteristic lincRNA stably and differentially expressed by a patient with early lung squamous carcinoma;
step 2, selecting characteristic lincRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
and 4, early prediction is carried out according to the expression level of lincRNA which is characteristic of the patient, and the method is used for the diagnosis and treatment of non-diseases.
Optionally, the characteristic lincRNA for obtaining stable differential expression of patients with squamous cell lung carcinoma in the early stage in the step 1 is specifically:
step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of the patient with squamous cell lung cancer from a Genomic Data common Data Portal database to obtain readcounts numerical values of tumor tissue gene expression profiles of the patient with squamous cell lung cancer, namely sequencing read numerical values, and carrying out logarithmic conversion;
step 1.2, selecting lincRNA with the read counts of the lincRNA in all samples being more than or equal to 10, taking the logarithm of the read counts of all the lincRNA, setting the total number of the samples as n, setting the total number of the screened lincRNA as m, v as the read counts of the lincRNA, and u as an expression value after taking the logarithm, wherein the expression value is obtained;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the lincRNA number, uijExpression value after taking logarithm of ith sample and jth lincRNA number, vijRead counts numbering the ith sample, the jth lincRNA;
step 1.3, selecting squamous cell lung carcinoma patients with disease stages of I and II, recording the patients as squamous cell lung carcinoma early-stage patients, and recording the total number of squamous cell lung carcinoma early-stage patients as n';
step 1.4, lincRNA with the coefficient of variation smaller than 0.2 in the tumor sample and the normal sample is selected, mu is the expression mean value of the lincRNA in all samples, sigma is the standard deviation, and the calculation formula of the coefficient of variation is as follows:
Figure BDA0002617450900000031
wherein j is the lincRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation for jth lincRNA numbering, μjThe expression average of lincRNA numbered for the jth lincRNA, set as m1LincRNA Total for Stable expressionCounting, then there are:
Figure BDA0002617450900000045
step 1.5, lincRNA differentially expressed in tumor and normal samples was selected. The log-taken expression values were used to calculate the log-taken fold change f of the lincrnas in tumor and normal samples, and the formula is:
Figure BDA0002617450900000046
wherein j is the lincRNA number, fjFold change for jth lincRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for jth lincRNA2jThe expression mean of the normal sample numbered for the jth lincRNA;
the expression difference of lincRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
Figure BDA0002617450900000041
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of lincRNA in tumor samples, μ2Is the mean value of the expression of lincRNA in a normal sample,
Figure BDA0002617450900000042
the variance of lincRNA in the tumor sample,
Figure BDA0002617450900000043
lincRNA variance for normal samples;
correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in each lincRNA are:
Figure BDA0002617450900000044
wherein j is the lincRNA number, qjRepresents the FDR corrected value of the jth lincRNA number, pjP-value, r, from t-test representing the jth lincRNA numberjP-value at m representing the jth lincRNA number1The sequenced position in each lincRNA;
finally, lincRNA with the absolute value of the fold change f larger than 1 and the q value smaller than or equal to 0.05 after FDR correction is selected and marked as characteristic lincRNA, and the total number of the characteristic lincRNA is set as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)。
optionally, the characteristic lincRNA expression data selected in step 2 is subjected to data normalization on each sample, specifically:
Figure BDA0002617450900000051
wherein i is the sample number, j is the characteristic lincRNA number, μiThe mean, σ, of all characteristic lincRNA expression of the ith sampleiFor all characteristic lincRNA standard deviations, u, of the i-th sampleijTo take the characteristic lincRNA expression value after log, uij' is the normalized lincRNA value.
Optionally, the step 3 of constructing an early prediction model for the normalized data by using a support vector machine specifically includes:
step 3.1, grouping all samples, dividing 80% of all samples into a training set and a verification set, dividing the rest 20% of all samples into a test set, wherein the training set and the verification set are used for 5-fold cross verification, namely, dividing the training set and the verification set into 5 equal groups, taking one group as the verification set and the rest 4 groups as the training set in sequence, giving parameters, wherein the training set is used for constructing a model, and the verification set is used for checking the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, C is a regularization parameter and limits the importance of each point, and a parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100](9)
C[0.001,0.01,0.1,1,10,100](10)
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, then a validation set is used for checking the accuracy of the model, for each parameter combination, 1 accuracy is generated in each validation of 5-fold cross validation, 5 accuracies are generated by carrying out 5 times of validation in total, and the parameter combination with the highest average accuracy of the 5 verifications is selected as the optimal parameter;
3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set; the evaluation index includes accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Mathematic Correlation Coefficient (MCC), and area under the subject operating curve (ROC) (AUC). In the test set, the tumor counts are defined as True Positive (TP), normal but predicted tumor counts as False Positive (FP), tumor counts as False Negative (FN), and normal and predicted as True Negative (TN). The above evaluation index calculation formula is:
Figure BDA0002617450900000061
Figure BDA0002617450900000062
Figure BDA0002617450900000063
Figure BDA0002617450900000064
Figure BDA0002617450900000065
Figure BDA0002617450900000066
Figure BDA0002617450900000067
the accuracy, recall, specificity, F1 score and AUC of the above assessment indices returned values between (0, 1); the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier; therefore, the closer the above index is to 1, the better the prediction effect of the entire model is.
And 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect, and then all data are used and the optimal parameter combination is used for constructing a final prediction model.
Optionally, the optimal parameters are: gamma is 0.01 and C is 1.
Optionally, the early prediction in step 4 according to the expression level of lincRNA characteristic to the patient is specifically as follows:
step 4.1, standardizing the characteristic lincRNA expression data of the prediction sample, setting u as the characteristic lincRNA expression value of the prediction sample, setting mu as the average value of the characteristic lincRNA expression of the prediction sample, and setting sigma as the standard deviation of the characteristic lincRNA of the prediction sample, wherein the formula is as follows:
Figure BDA0002617450900000071
wherein jNumbering of characteristic lincRNAs, uj' is the normalized lincRNA value;
and 4.2, substituting the normalized lincRNA value of the prediction sample into the final prediction for prediction. A prediction result of 1 indicates that squamous cell lung carcinoma has occurred, and a prediction result of 0 indicates that the lung carcinoma is normal.
Compared with the prior art, the invention can obtain the following technical effects:
1) the invention has fast prediction speed: the prediction model constructed by the invention can be used for rapidly predicting large-scale samples, and the prediction time of 100 samples only needs a few seconds.
2) The invention has high accuracy: the prediction model constructed by the method has high prediction accuracy and accuracy, and the area AUC under the ROC curve can reach 0.994.
3) The influence of the platform heterogeneity is small: since there is a large difference in lincRNA expression values determined for different analysis platforms, the present invention predicts the use of normalized characteristic lincRNA expression values and is therefore less affected by platform heterogeneity.
Of course, it is not necessary for any one product in which the invention is practiced to achieve all of the above-described technical effects simultaneously.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of data screening and model building according to the present invention;
FIG. 2 is a cross-validation parameter optimization process for a support vector machine model according to the present invention;
FIG. 3 is a diagram of a test set evaluation index for a support vector machine model according to the present invention;
FIG. 4 is a support vector machine model test set ROC curve of the present invention.
Detailed Description
The following embodiments are described in detail with reference to the accompanying drawings, so that how to implement the technical features of the present invention to solve the technical problems and achieve the technical effects can be fully understood and implemented.
The invention discloses a lung squamous carcinoma early prediction method based on lincRNA expression profile combination characteristics, which comprises the following steps:
step 1, obtaining characteristic lincRNA stably and differentially expressed by a patient with squamous cell lung carcinoma at an early stage, wherein the detailed flow is shown in a figure 1, and specifically comprises the following steps:
step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of patients with squamous cell lung cancer from a Genomic Data common Data Portal database, obtaining a tumor tissue gene expression profile sequencing read (read counts) value of the patients with squamous cell lung cancer, and carrying out logarithmic conversion;
and 1.2, selecting the lincRNA with certain expression abundance, namely readcounts of the lincRNA in all samples are more than or equal to 10. Taking the logarithm of the read counts of all the lincRNAs, setting the total number of samples as n, setting the total number of the screened lincRNAs as m, setting v as the read counts of the lincRNAs, and setting u as the expression value after taking the logarithm, wherein the number of the read counts is m;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the lincRNA number, uijExpression value after taking logarithm of ith sample and jth lincRNA number, vijThe read counts number for the ith sample, jth lincRNA number.
Step 1.3, selecting squamous cell lung carcinoma patients with disease stages of I and II, recording the patients as squamous cell lung carcinoma early-stage patients, and recording the total number of squamous cell lung carcinoma early-stage patients as n';
step 1.4, selecting the lincRNA stably expressed in the tumor sample and the normal sample, namely the lincRNA with the coefficient of variation smaller than 0.2 in the tumor sample and the normal sample, setting mu as the expression mean value of the lincRNA in all samples, setting sigma as the standard deviation, and calculating the coefficient of variation according to the formula:
Figure BDA0002617450900000091
wherein j is the lincRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation for jth lincRNA numbering, μjThe expression mean of lincRNA numbered for the jth lincRNA; let m1For the total number of stably expressed lincrnas, the following are:
Figure BDA0002617450900000092
step 1.5, lincRNA differentially expressed in tumor and normal samples was selected. The log-taken expression values were used to calculate the log-taken fold change f of the lincrnas in tumor and normal samples, and the formula is:
Figure BDA0002617450900000093
wherein j is the lincRNA number, fjFold change for jth lincRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for jth lincRNA2jThe expression mean of the normal sample numbered for the jth lincRNA.
The expression difference of lincRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
Figure BDA0002617450900000101
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of lincRNA in tumor samples, μ2Is the mean value of the expression of lincRNA in a normal sample,
Figure BDA0002617450900000104
the variance of lincRNA in the tumor sample,
Figure BDA0002617450900000105
is the lincRNA variance of normal samples.
Correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in each lincRNA are:
Figure BDA0002617450900000102
wherein j is the lincRNA number, qjRepresents the FDR corrected value of the jth lincRNA number, pjP-value, r, from t-test representing the jth lincRNA numberjP-value at m representing the jth lincRNA number1The sequenced positions in each lincRNA.
And finally selecting lincRNA with the absolute value of the fold change f larger than 1 and the q value smaller than or equal to 0.05 after FDR correction, marking the lincRNA as the characteristic lincRNA, and setting the total number of the characteristic lincRNA as m2, wherein the following steps are performed:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)
through the screening, 21 lincRNAs with characteristics of lung squamous cell carcinoma are finally obtained, and are shown in Table 1. The nucleotide probe sequences of 21 lincRNA characteristic of squamous cell lung carcinoma are shown in Table 2.
TABLE 1 characteristics of squamous cell lung carcinoma lincRNA
Figure BDA0002617450900000103
Figure BDA0002617450900000111
TABLE 2 nucleotide probe sequences of lincRNA characteristic of squamous cell lung carcinoma
Figure BDA0002617450900000112
Figure BDA0002617450900000121
Step 2, selecting characteristic lincRNA expression data, and carrying out data standardization on each sample, wherein the method specifically comprises the following steps:
Figure BDA0002617450900000122
where i is the sample number and j is the characteristic lincRNA number. Mu.siThe mean, σ, of all characteristic lincRNA expression of the ith sampleiFor all characteristic lincRNA standard deviations, u, of the i-th sampleijTo take the characteristic lincRNA expression value after log, uij' is the normalized lincRNA value.
Step 3, constructing an early prediction model for the standardized data by using a support vector machine, specifically:
and 3.1, grouping all samples. 80% of all samples are divided into training set + validation set, and the remaining 20% are divided into test set. The training set and the verification set are used for 5-fold cross validation, namely the training set and the verification set are divided into 5 groups which are equal, one group is used as the verification set in sequence, and the other 4 groups are used as the training set. Given the parameters, the training set is used to construct the model, and the validation set is used to verify the accuracy of the model, as detailed in FIG. 1.
And 3.2, screening the optimal parameters. The parameter gamma in the SVM controls the width of the Gaussian kernel, and C is a regularization parameter, limiting the importance of each point. The parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100](9)
C=[0.001,0.01,0.1,1,10,100](10)
in cross-validation, the model is constructed using a combination of every two parameters gamma and C in turn, and then the validation set is used to verify the model accuracy. For each parameter combination, each validation of 5-fold cross-validation yielded 1 accuracy, and a total of 5 validations yielded 5 accuracies. And selecting the parameter combination with the highest average accuracy of 5 times of verification as the optimal parameter. Fig. 2 shows the cross-validation parameter optimization process, where the model cross-validation accuracy is highest when the parameter gamma is 0.01 and the parameter C is 1: 0.997. the optimal parameters of the model are therefore: gamma is 0.01 and C is 1.
And 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set. The evaluation index includes accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Mathematic Correlation Coefficient (MCC), and area under the subject operating curve (ROC) (AUC). In the test set, the tumor counts are defined as True Positive (TP), normal but predicted tumor counts as False Positive (FP), tumor counts as False Negative (FN), and normal and predicted as True Negative (TN). The above evaluation index calculation formula is:
Figure BDA0002617450900000131
Figure BDA0002617450900000132
Figure BDA0002617450900000133
Figure BDA0002617450900000134
Figure BDA0002617450900000135
Figure BDA0002617450900000136
Figure BDA0002617450900000141
the accuracy, recall, specificity, F1 score and AUC returned values between (0, 1) in the above evaluation indices. The higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier. Therefore, the closer the above index is to 1, the better the prediction effect of the entire model is.
And 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect. The final prediction model is constructed with the optimal parameter combinations using all the data.
FIG. 3 shows the accuracy, recall, specificity, F1 score and MCC in the above evaluation criteria, wherein all 6 criteria are greater than 0.94; FIG. 4 shows the ROC curve and AUC, with an AUC of 0.994 in the test set. The evaluation indexes show that the model has good prediction effect. Thus, using all the data, the final prediction model is constructed with the optimal parameter combinations.
And 4, carrying out early prediction according to the expression level of the lincRNA characteristic of the patient, specifically comprising the following steps:
step 4.1, standardizing the characteristic lincRNA expression data of the prediction sample, setting u as the characteristic lincRNA expression value of the prediction sample, setting mu as the average value of the characteristic lincRNA expression of the prediction sample, and setting sigma as the standard deviation of the characteristic lincRNA of the prediction sample, wherein the formula is as follows:
Figure BDA0002617450900000142
wherein j is the characteristic lincRNA numbering, uj' normalized lincRNA expression values for the jth characteristic lincRNA number.
The method randomly selects 10 samples for prediction, and eliminates the 10 samples when a final prediction model is constructed. The numbers of the 10 samples taken and the normalized characteristic lincRNA values are shown in table 3.
TABLE 3.10 sample numbers and values normalized for characteristic lincRNA
Figure BDA0002617450900000151
And 4.2, substituting the normalized lincRNA value of the prediction sample into the final prediction for prediction. A prediction result of 1 indicates that squamous cell lung carcinoma has occurred, and a prediction result of 0 indicates that the lung carcinoma is normal.
The sample numbers of 10 cases, corresponding TCGA numbers, actual states and predicted results are shown in Table 4. The prediction results of 10 samples completely accord with the actual state, which shows that the invention can accurately predict the squamous cell lung carcinoma at early stage.
TABLE 4.10 sample numbers, corresponding TCGA numbers, actual and predicted states
Figure BDA0002617450900000152
Figure BDA0002617450900000161
In conclusion, the characteristic lincRNA expression profile combination has high prediction accuracy, and can effectively perform early prediction and diagnosis of squamous cell lung carcinoma. In addition, the method has no platform dependency, and can predict data from various sources.
While the foregoing description shows and describes several preferred embodiments of the invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Figure BDA0002617450900000171
Figure BDA0002617450900000181
Figure BDA0002617450900000191
Figure BDA0002617450900000201
Figure BDA0002617450900000211
SEQUENCE LISTING
<110> Kunming animal research institute of Chinese academy of sciences
<120> characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method
<130>2019
<160>21
<170>PatentIn version 3.3
<210>1
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>1
aacggggttt caccatgttg gccatgctgg 30
<210>2
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>2
gtgacagtcc ctgtgctacc tctcaagccc 30
<210>3
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>3
catgagtgtc ggccgcagga gcccacaagt 30
<210>4
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>4
cataattcag attacagtta gtcaattaat 30
<210>5
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>5
gctccgcagg atccccgcga ggaacagctg 30
<210>6
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>6
ccagggagag gggaagggga gatgaggagt 30
<210>7
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>7
agcaagtgtc tatgacatag tttggtgggg 30
<210>8
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>8
atcccgttag gaaacaacgg aggatggggc 30
<210>9
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>9
tatgtcctta tgcccccccc ccaactatat 30
<210>10
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>10
caccacccca gcagcccggg tcccgggtgg 30
<210>11
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>11
cactccagcc tgggtgacag aacagactgt 30
<210>12
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>12
aaatgcccac gataaacaaa taataaatag 30
<210>13
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>13
ttcacatttg gcgttagggc tagtatttca 30
<210>14
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>14
cattctcaga gcacaaagac cccatgatct 30
<210>15
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>15
ataagcagcc tcaaggacca agaaccatct 30
<210>16
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>16
gctcccgccc tcccggccct gggctctcag 30
<210>17
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>17
tcccaccttt cccggcatcc caaggccaga 30
<210>18
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>18
agttgctgag aggaggccag caggcaaatt 30
<210>19
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>19
gaaaagaacg ccgggggatt tggcttaaac 30
<210>20
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>20
cccaaaatac agtctttgtg ttgccatctg 30
<210>21
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>21
cgtgctggtg gggagactgg ttgagcaggt 30

Claims (7)

1. A combination of characteristic lincRNA expression profiles comprising: AC026401.3, AC125807.2, AC244090.1, AL137003.2, AL355338.1, AL359643.3, AL365203.2, AP002360.1, AP003486.1, BAIAP2-DT, HEIH, LINC01503, MIAT, MIR210HG, MIR22HG, NUP50-DT, PCAT19, PSMB8-AS1, PSMG3-AS1, PVT1 and SMIM25, and the nucleotide sequences thereof are shown in SEQ ID NO.1-SEQ ID NO. 21.
2. A lung squamous carcinoma early prediction method based on characteristic lincRNA expression profile combination is characterized by comprising the following steps:
step 1, obtaining characteristic lincRNA stably and differentially expressed by a patient with early lung squamous carcinoma;
step 2, selecting characteristic lincRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
step 4, carrying out early prediction according to the expression level of lincRNA (lincRNA) of the patient characteristics;
the method is useful for non-disease diagnostic and therapeutic purposes.
3. The prediction method according to claim 2, wherein the characteristic lincRNA stably and differentially expressed in the patient with early lung squamous carcinoma obtained in the step 1 is specifically:
step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of the patient with squamous cell lung cancer from a Genomic Data common Data Portal database to obtain a tumor tissue gene expression profile read counts value of the patient with squamous cell lung cancer, namely a sequencing read value, and carrying out logarithmic conversion;
step 1.2, selecting lincRNA with the read counts of the lincRNA in all samples being more than or equal to 10, taking the logarithm of the read counts of all the lincRNA, setting the total number of the samples as n, setting the total number of the screened lincRNA as m, v as the read counts of the lincRNA, and u as an expression value after taking the logarithm, wherein the expression value is obtained;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the lincRNA number, uijExpression value after taking logarithm of ith sample and jth lincRNA number, vijRead counts values for the ith sample, jth lincRNA number;
step 1.3, selecting squamous cell lung carcinoma patients with disease stages of I and II, recording the patients as squamous cell lung carcinoma early-stage patients, and recording the total number of squamous cell lung carcinoma early-stage patients as n';
step 1.4, lincRNA with the coefficient of variation smaller than 0.2 in the tumor sample and the normal sample is selected, mu is the expression mean value of the lincRNA in all samples, sigma is the standard deviation, and the calculation formula of the coefficient of variation is as follows:
Figure FDA0002617450890000021
wherein j is the lincRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejStandard deviation for jth lincRNA numbering, μjThe expression average of lincRNA numbered for the jth lincRNA, set as m1For stable expression of liTotal number of ncRNAs, then:
Figure FDA0002617450890000022
step 1.5, lincRNA differentially expressed in tumor and normal samples was selected. The log-taken expression values were used to calculate the log-taken fold change f of the lincrnas in tumor and normal samples, and the formula is:
Figure FDA0002617450890000023
wherein j is the lincRNA number, fjFold change for jth lincRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for jth lincRNA2jThe expression mean of the normal sample numbered for the jth lincRNA;
the expression difference of lincRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
Figure FDA0002617450890000024
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean expression of lincRNA in tumor samples, μ2Is the mean value of the expression of lincRNA in a normal sample,
Figure FDA0002617450890000031
the variance of lincRNA in the tumor sample,
Figure FDA0002617450890000032
lincRNA variance for normal samples;
correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in each lincRNA are:
Figure FDA0002617450890000033
wherein j is the lincRNA number, qjRepresents the FDR corrected value of the jth lincRNA number, pjP-value, r, from t-test representing the jth lincRNA numberjP-value at m representing the jth lincRNA number1The sequenced position in each lincRNA;
finally, lincRNA with the absolute value of the fold change f larger than 1 and the q value smaller than or equal to 0.05 after FDR correction is selected and marked as characteristic lincRNA, and the total number of the characteristic lincRNA is set as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)。
4. the prediction method according to claim 2, wherein the characteristic lincRNA expression data is selected in step 2, and the data is normalized for each sample, specifically:
Figure FDA0002617450890000034
wherein i is the sample number, j is the characteristic lincRNA number, μiThe mean, σ, of all characteristic lincRNA expression of the ith sampleiFor all characteristic lincRNA standard deviations, u, of the i-th sampleijTo take the characteristic lincRNA expression value after log, uij' is the normalized lincRNA value.
5. The prediction method according to claim 2, wherein the step 3 of constructing an early prediction model for the normalized data by using a support vector machine comprises:
step 3.1, grouping all samples, dividing 80% of all samples into a training set and a verification set, dividing the rest 20% of all samples into a test set, wherein the training set and the verification set are used for 5-fold cross verification, namely, dividing the training set and the verification set into 5 equal groups, taking one group as the verification set and the rest 4 groups as the training set in sequence, giving parameters, wherein the training set is used for constructing a model, and the verification set is used for checking the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, C is a regularization parameter and limits the importance of each point, and a parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100](9)
C=[0.001,0.01,0.1,1,10,100](10)
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, then a validation set is used for checking the accuracy of the model, for each parameter combination, 1 accuracy is generated in each validation of 5-fold cross validation, 5 accuracies are generated by carrying out 5 times of validation in total, and the parameter combination with the highest average accuracy of the 5 verifications is selected as the optimal parameter;
3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set; the evaluation indexes include accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Mathews Correlation Coefficient (MCC), and area under the subject operating curve (ROC) (AUC); in the test set, defining the tumor count as True Positive (TP), the tumor count as normal but predicted as False Positive (FP), the tumor count as true but predicted as normal False Negative (FN), the tumor count as normal but predicted as True Negative (TN); the above evaluation index calculation formula is:
Figure FDA0002617450890000041
Figure FDA0002617450890000042
Figure FDA0002617450890000043
Figure FDA0002617450890000051
Figure FDA0002617450890000052
Figure FDA0002617450890000053
Figure FDA0002617450890000054
the accuracy, recall, specificity, F1 score and AUC of the above assessment indices returned values between (0, 1); the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier; therefore, the closer the above index is to 1, the better the overall prediction effect of the model is;
and 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect, and then all data are used and the optimal parameter combination is used for constructing a final prediction model.
6. The prediction method according to claim 2, wherein the optimal parameters are: gamma is 0.01 and C is 1.
7. The prediction method according to claim 2, wherein the early prediction in step 4 is performed based on the patient-specific lincRNA expression level, and specifically comprises:
step 4.1, standardizing the characteristic lincRNA expression data of the prediction sample, setting u as the characteristic lincRNA expression value of the prediction sample, setting mu as the average value of the characteristic lincRNA expression of the prediction sample, and setting sigma as the standard deviation of the characteristic lincRNA of the prediction sample, wherein the formula is as follows:
Figure FDA0002617450890000061
wherein j is the characteristic lincRNA numbering, uj' is the normalized lincRNA value;
and 4.2, substituting the normalized lincRNA value of the prediction sample into the final prediction for prediction. A prediction result of 1 indicates that squamous cell lung carcinoma has occurred, and a prediction result of 0 indicates that the lung carcinoma is normal.
CN202010769768.0A 2020-08-04 2020-08-04 Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method Pending CN111850124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010769768.0A CN111850124A (en) 2020-08-04 2020-08-04 Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010769768.0A CN111850124A (en) 2020-08-04 2020-08-04 Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method

Publications (1)

Publication Number Publication Date
CN111850124A true CN111850124A (en) 2020-10-30

Family

ID=72954427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010769768.0A Pending CN111850124A (en) 2020-08-04 2020-08-04 Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method

Country Status (1)

Country Link
CN (1) CN111850124A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117070635A (en) * 2023-10-12 2023-11-17 上海爱谱蒂康生物科技有限公司 Application of biomarker combination in preparation of kit for predicting transparent renal cell carcinoma

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363907A (en) * 2018-05-09 2018-08-03 中国科学院昆明动物研究所 A kind of adenocarcinoma of lung personalization prognostic evaluation methods based on multi-gene expression characteristic spectrum
CN108998531A (en) * 2018-08-31 2018-12-14 昆明医科大学第附属医院 Lung cancer lowers long-chain non-coding RNA marker and its application
CN109859801A (en) * 2019-02-14 2019-06-07 辽宁省肿瘤医院 A kind of model and method for building up containing seven genes as biomarker prediction lung squamous cancer prognosis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363907A (en) * 2018-05-09 2018-08-03 中国科学院昆明动物研究所 A kind of adenocarcinoma of lung personalization prognostic evaluation methods based on multi-gene expression characteristic spectrum
CN108998531A (en) * 2018-08-31 2018-12-14 昆明医科大学第附属医院 Lung cancer lowers long-chain non-coding RNA marker and its application
CN109859801A (en) * 2019-02-14 2019-06-07 辽宁省肿瘤医院 A kind of model and method for building up containing seven genes as biomarker prediction lung squamous cancer prognosis

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117070635A (en) * 2023-10-12 2023-11-17 上海爱谱蒂康生物科技有限公司 Application of biomarker combination in preparation of kit for predicting transparent renal cell carcinoma
CN117070635B (en) * 2023-10-12 2024-01-26 上海爱谱蒂康生物科技有限公司 Application of biomarker combination in preparation of kit for predicting transparent renal cell carcinoma

Similar Documents

Publication Publication Date Title
CN111748632A (en) Characteristic lincRNA expression profile combination and liver cancer early prediction method
CN109801680B (en) Tumor metastasis and recurrence prediction method and system based on TCGA database
JP7041614B6 (en) Multi-level architecture for pattern recognition in biometric data
CN111748633A (en) Characteristic miRNA expression profile combination and head and neck squamous cell carcinoma early prediction method
CN115295074B (en) Application of gene marker in malignant pulmonary nodule screening, construction method of screening model and detection device
CN115631789B (en) Group joint variation detection method based on pan genome
CN114891887A (en) Method for screening triple negative breast cancer prognosis gene marker
CN108804876A (en) Method and apparatus for calculating cancer sample purity and ploidy
CN111748634A (en) Characteristic lincRNA expression profile combination and early prediction method of colon cancer
CN111944902A (en) Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics
CN111944900A (en) Characteristic lincRNA expression profile combination and early endometrial cancer prediction method
CN111763738A (en) Characteristic mRNA expression profile combination and liver cancer early prediction method
CN111733251A (en) Characteristic miRNA expression profile combination and early prediction method of renal clear cell carcinoma
CN111850124A (en) Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method
CN114242178A (en) Method for quantitatively predicting biological activity of ER alpha antagonist based on gradient lifting decision tree
CN111808965A (en) Characteristic lincRNA expression profile combination and early prediction method of renal clear cell carcinoma
CN111793692A (en) Characteristic miRNA expression profile combination and lung squamous carcinoma early prediction method
CN116312800A (en) Lung cancer characteristic identification method, device and storage medium based on circulating RNA whole transcriptome sequencing in blood plasma
CN114203256B (en) MIBC typing and prognosis prediction model construction method based on microbial abundance
CN111733252A (en) Characteristic miRNA expression profile combination and early gastric cancer prediction method
CN111793691A (en) Characteristic mRNA expression profile combination and lung squamous cell carcinoma early prediction method
CN111718996A (en) Characteristic lincRNA expression profile combination and early gastric cancer prediction method
Mythili et al. CTCHABC-hybrid online sequential fuzzy Extreme Kernel learning method for detection of Breast Cancer with hierarchical Artificial Bee
CN111748631A (en) Characteristic miRNA expression profile combination and liver cancer early stage prediction method
CN109887543B (en) Differential methylation site recognition method for low methylation level

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030