CN111876485A - Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method - Google Patents

Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method Download PDF

Info

Publication number
CN111876485A
CN111876485A CN202010775029.2A CN202010775029A CN111876485A CN 111876485 A CN111876485 A CN 111876485A CN 202010775029 A CN202010775029 A CN 202010775029A CN 111876485 A CN111876485 A CN 111876485A
Authority
CN
China
Prior art keywords
mrna
prediction
sample
expression
cell carcinoma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010775029.2A
Other languages
Chinese (zh)
Inventor
刘斐
贺轲
李文兴
安三奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong No 2 Peoples Hospital
Original Assignee
Guangdong No 2 Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong No 2 Peoples Hospital filed Critical Guangdong No 2 Peoples Hospital
Priority to CN202010775029.2A priority Critical patent/CN111876485A/en
Publication of CN111876485A publication Critical patent/CN111876485A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Public Health (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Analytical Chemistry (AREA)
  • Hospice & Palliative Care (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Library & Information Science (AREA)
  • Oncology (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Microbiology (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)

Abstract

The invention discloses a characteristic mRNA expression profile combination and an early prediction method of head and neck squamous cell carcinoma, wherein the mRNA nucleotide probe sequence is shown as SEQ ID NO. 1-20. The method for evaluating the early risk of the head and neck squamous cell carcinoma based on the mRNA expression profile combination characteristics has high precision and accuracy (the AUC under the ROC curve is 1.000). The relative expression quantity of the 20 mRNAs is only needed to be obtained, the probability of early-stage head and neck squamous cell carcinoma is calculated and given through a support vector machine model, and the method can be used as a reference basis for early-stage head and neck squamous cell carcinoma prediction.

Description

Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method
Technical Field
The invention belongs to the field of biotechnology and medicine, and particularly relates to a characteristic mRNA expression profile combination and an early prediction method of head and neck squamous cell carcinoma.
Background
Head and neck squamous cell carcinoma (head and neck squamous cell carcinoma), which accounts for 90% of head and neck cancers, is a rapidly and diffusely distributed malignant neoplasm originating in cells of the upper respiratory tract, including malignant neoplasms of the lips and oral cavity, oropharynx, hypopharynx, larynx, paranasal sinuses, and salivary glands. Squamous cell carcinoma of the head and neck usually begins with squamous cells lining mucosal surfaces, the most common types of squamous cell carcinoma of the head and neck being tumors located in the oral cavity and oropharynx.
A Support Vector Machine (SVM) is a generalized linear classifier that performs binary classification on data in a supervised learning manner, and a decision boundary of the SVM is a maximum edge distance hyperplane for solving a learning sample. The SVM model represents instances as points in space, so that the mapping is such that the individual classes of instances are separated by as wide an apparent interval as possible. The new instances are then mapped to the same space and the categories are predicted based on which side of the interval they fall on. When the training data is linearly separable, the SVM is classified by hard interval maximization learning. When the training data is linearly not separable, the SVM classifies by using kernel skills and soft interval maximization learning. SVMs are powerful for medium-sized data sets with similar meaning of features and are also suitable for small data sets. In general, the prediction effect is good for the SVM data set with the sample size less than 1 ten thousand. SVM has a wide range of applications in disease diagnosis, tumor classification, tumor gene recognition, and the like.
Early diagnosis of tumors has been a difficult problem in the medical community. The existing early diagnosis methods mostly observe the expression level of one or a class of markers, and the ideal diagnosis effect is difficult to achieve. Since the expression profiles of these markers in tumor patients and normal populations partially overlap, it is difficult to define a cut-off for the markers that better separates tumor patients from normal populations. Therefore, the use of multiple marker expression signature combinations may be an effective method for early diagnosis of tumors. MessengerRNA (mRNA) is a single-stranded ribonucleic acid (SNP) that is transcribed from a single strand of DNA as a template and carries genetic information that directs protein synthesis. Tumor tissues often show a large number of mRNA disorders compared to normal tissues, and studies have shown that these mRNA disorders are closely related to tumor occurrence, pathological mechanisms and prognosis status. However, since the distribution of single mRNA molecules in tumor and normal human populations overlaps, it is difficult to define a critical value for early diagnosis.
Therefore, there is a need to establish a more stable predictive model of the combination of multiple differential mRNA expression signatures for early prediction of head and neck squamous cell carcinoma.
Disclosure of Invention
In view of the above, the present invention provides a combination of characteristic mRNA expression profiles and an early prediction method of head and neck squamous cell carcinoma.
In order to solve the technical problem, the invention discloses a characteristic mRNA expression profile combination, which comprises AC011462.1, ARHGEF10L, BMP1, CCM2, CD276, COLGALT1, DCBLD1, GPD1L, GPT2, HOMER3, MPC1, MRGBP, P3H1, PLOD3, PRADC1, SERPINH1, SLC26A6, SMDT1, SNAI2 and TPT1, wherein the nucleotide probe sequence is shown as SEQ ID NO. 1-20.
The invention also discloses a method for early predicting the head and neck squamous cell carcinoma by combining the characteristic mRNA expression profiles, which comprises the following steps:
step 1, acquiring characteristic mRNA stably and differentially expressed by a patient with head and neck squamous cell carcinoma at an early stage;
step 2, selecting characteristic mRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
step 4, early prediction is carried out according to the expression level of the mRNA which is characteristic of the patient;
the method is for non-disease diagnostic and therapeutic purposes.
Optionally, the step 1 of obtaining characteristic mrnas stably and differentially expressed by the patient in the early stage of the head and neck squamous cell carcinoma specifically comprises:
step 1.1, downloading the transcriptome Data and clinical Data of the tumor tissue and the para-carcinoma tissue of the patient with the squamous cell carcinoma of head and neck from a Genomic Data common Data Portal database to obtain a tumor tissue gene expression profile read counts value of the patient with the squamous cell carcinoma of head and neck, namely a sequencing reading value, and carrying out logarithmic conversion;
step 1.2, selecting mRNA with certain expression abundance, namely, reading counts of the mRNA in all samples are more than or equal to 10; taking logarithm of read counts of all mRNA, setting the total number of samples as n, taking the total number of screened mRNA as m, v as read counts of the mRNA, and u as expression value after taking logarithm, and then obtaining the result;
uij=log2vij,i∈(1,n),j∈(1,m) (1)
wherein i is the sample number, j is the mRNA number, uijThe expression value after taking the logarithm of the ith sample and the jth mRNA number, vijRead counts values for sample i, mRNA j number;
step 1.3, selecting head and neck squamous cell carcinoma patients with disease stages of I stage and II stage, recording the patients as head and neck squamous cell carcinoma early-stage patients, and recording the total number of the head and neck squamous cell carcinoma early-stage patients as n';
step 1.4, selecting mRNA stably expressed in the tumor sample and the normal sample, namely mRNA with the variation coefficient smaller than 0.1 in the tumor sample and the normal sample, setting mu as the expression mean value of the mRNA in all samples, setting sigma as standard deviation, and calculating the variation coefficient according to the formula:
Figure BDA0002617840460000031
wherein j is the mRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejIs the standard deviation of the jth mRNA number, μjThe average expression value of the mRNA numbered for the jth mRNA, let m1For the total number of stably expressed mrnas, there are:
Figure BDA0002617840460000032
step 1.5, mRNA which is differentially expressed in a tumor sample and a normal sample is selected; the logarithmized expression values were used to calculate the log-oriented fold change f of tumor and normal sample mrnas, and the formula is:
Figure BDA0002617840460000041
wherein j is the mRNA number, fjFold change for jth mRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for the jth mRNA2jExpression mean of the normal sample numbered for the jth mRNA;
the expression difference of mRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
Figure BDA0002617840460000042
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean tumor sample mRNA expression, μ2Is the mean value of the mRNA expression of a normal sample,
Figure BDA0002617840460000043
the variance of the mRNA in the tumor sample is obtained,
Figure BDA0002617840460000044
mRNA variance for normal samples;
correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in the individual mRNAs are:
Figure BDA0002617840460000045
wherein j is the mRNA number, qjRepresents the FDR corrected value of the jth mRNA number, pjP-value, r, from t-test representing the jth mRNA numberjP-value at m representing the jth mRNA number1The sequenced position in the individual mRNA;
finally selecting mRNA with the multiple change f of more than 1 and the FDR corrected q value of less than or equal to 0.05, marking as characteristic mRNA, and setting the total number of the characteristic mRNA as m2Then, there are:
Figure BDA0002617840460000051
optionally, the characteristic mRNA expression data in step 2 is selected, and data is normalized for each sample, where the formula is:
Figure BDA0002617840460000052
wherein i is the sample number, j is the characteristic mRNA number, μiMean, σ, of all characteristic mRNA expressions of the ith sampleiFor all characteristic mRNA standard deviations, u, of the ith sampleijFor logarithmic characteristic mRNA expression values, uij' is the normalized mRNA value.
Optionally, the support vector machine used in step 3 constructs an early prediction model for the normalized data, specifically:
and 3.1, grouping all samples. Dividing 80% of all samples into a training set and a verification set, dividing the rest 20% of all samples into a test set, wherein the training set and the verification set are used for 5-fold cross verification, namely dividing the training set and the verification set into 5 equal groups, taking one group as the verification set and taking the rest 4 groups as the training set in sequence, giving parameters, the training set is used for constructing a model, and the verification set is used for verifying the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, and C is a regularization parameter and limits the importance of each point; the parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100](9)
C=[0.001,0.01,0.1,1,10,100](10)
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, then the model accuracy is checked by using a validation set, for each parameter combination, 1 accuracy is generated in each validation of 5-fold cross validation, and 5 accuracies are generated by carrying out 5 times of validation in total. Selecting a parameter combination with the highest average accuracy of 5 times of verification as an optimal parameter;
and 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set, wherein evaluation indexes comprise accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Matthews Correlation Coefficient (MCC) and area under the Receiver Operating Curve (ROC) (AUC). In the test set, the tumor counts are defined as true positive (tp), normal but predicted tumor counts as False Positive (FP), tumor counts as normal but predicted normal as False Negative (FN), and true normal and predicted normal as True Negative (TN). The above evaluation index calculation formula is:
Figure BDA0002617840460000061
Figure BDA0002617840460000062
Figure BDA0002617840460000063
Figure BDA0002617840460000064
Figure BDA0002617840460000065
Figure BDA0002617840460000066
Figure BDA0002617840460000067
the accuracy, recall, specificity, F1 score and AUC in the above evaluation index return values between (0, 1); the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index which is the harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier. Therefore, the closer the above index is to 1, the better the overall prediction effect of the model is;
step 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect; the final prediction model is constructed with the optimal parameter combinations using all the data.
Optionally, the early prediction in step 4 is performed according to the expression level of mRNA characteristic of the patient, specifically:
step 4.1, standardizing the characteristic mRNA expression data of the prediction sample, setting u as the characteristic mRNA expression value of the prediction sample, setting mu as the characteristic mRNA expression mean value of the prediction sample, setting sigma as the standard deviation of the characteristic mRNA of the prediction sample, and adopting the following formula:
Figure BDA0002617840460000071
wherein j is the characteristic mRNA number, uj' is the normalized mRNA value;
step 4.2, substituting the mRNA value after the prediction sample is standardized into the final prediction for prediction; a prediction of 1 indicates a head and neck squamous cell carcinoma, and a prediction of 0 indicates normal.
Compared with the prior art, the invention can obtain the following technical effects:
1) the prediction speed is high: the prediction model constructed by the invention can be used for quickly predicting large-scale samples, and the prediction time of 100 samples only needs a few seconds.
2) The accuracy is high: the prediction model constructed by the method has high prediction accuracy and accuracy which both reach over 90 percent, and the AUC of the area under the ROC curve can reach 1.000.
3) Platform heterogeneity impact is minor: because mRNA expression values measured by different analysis platforms have large difference, the invention predicts and uses normalized characteristic mRNA expression values, and is less influenced by platform heterogeneity.
Of course, it is not necessary for any one product to practice the invention to achieve all of the above-described technical results simultaneously.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention to the best mode contemplated. In the drawings:
FIG. 1 is a flow chart of data screening and model building according to the present invention;
FIG. 2 is a cross-validation parameter optimization process for a support vector machine model according to the present invention;
FIG. 3 is a diagram of a test set evaluation index for a support vector machine model according to the present invention;
FIG. 4 is a support vector machine model test set ROC curve of the present invention.
Detailed Description
The following embodiments are described in detail with reference to the accompanying drawings, so that how to implement the present invention by applying technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.
The invention discloses a head and neck squamous cell carcinoma early prediction method based on characteristic mRNA expression profile combination, which can accurately predict the I/II stage of head and neck squamous cell carcinoma and comprises the following steps:
step 1, obtaining mRNA (characteristic mRNA) stably and differentially expressed by a patient with head and neck squamous cell carcinoma at an early stage, specifically:
step 1.1, downloading the transcriptome Data and clinical Data of the tumor tissue and the para-carcinoma tissue of the patient with the squamous cell carcinoma of head and neck from a Genomic Data common Data Portal database to obtain a tumor tissue gene expression profile read counts value of the patient with the squamous cell carcinoma of head and neck, namely a sequencing reading value, and carrying out logarithmic conversion;
step 1.2, selecting mRNA with certain expression abundance, namely the read counts of the mRNA in all samples are more than or equal to 10. Taking logarithm of read counts of all mRNA, setting the total number of samples as n, taking the total number of screened mRNA as m, v as read counts of the mRNA, and u as expression value after taking logarithm, and then obtaining the result;
Figure BDA0002617840460000081
wherein i is the sample number, j is the mRNA number, uijThe expression value after taking the logarithm of the ith sample and the jth mRNA number, vijRead counts values for the ith sample, jth mRNA number.
Step 1.3, selecting head and neck squamous cell carcinoma patients with disease stages of I stage and II stage, recording the patients as head and neck squamous cell carcinoma early-stage patients, and recording the total number of the head and neck squamous cell carcinoma early-stage patients as n';
step 1.4, selecting mRNA stably expressed in the tumor sample and the normal sample, namely mRNA with the variation coefficient smaller than 0.1 in the tumor sample and the normal sample, setting mu as the expression mean value of the mRNA in all samples, setting sigma as standard deviation, and calculating the variation coefficient according to the formula:
Figure BDA0002617840460000091
wherein j is the mRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejIs the standard deviation of the jth mRNA number, μjThe average expression value of the mRNA numbered for the jth mRNA, let m1For the total number of stably expressed mrnas, there are:
Figure BDA0002617840460000092
step 1.5, mRNA which is differentially expressed in tumor samples and normal samples is selected. The logarithmized expression values were used to calculate the log-oriented fold change f of tumor and normal sample mrnas, and the formula is:
Figure BDA0002617840460000093
wherein j is the mRNA number, fjFold change for jth mRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for the jth mRNA2jExpression means of the normal sample numbered for the jth mRNA.
The expression difference of mRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
Figure BDA0002617840460000094
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean value of tumor sample mRNA expression,. mu.2Is the mean value of the mRNA expression of a normal sample,
Figure BDA0002617840460000101
the variance of the mRNA in the tumor sample is obtained,
Figure BDA0002617840460000102
is the normal sample mRNA variance.
Correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in the individual mRNAs are:
Figure BDA0002617840460000103
wherein j is mRNNumber A, qjRepresents the FDR corrected value of the jth mRNA number, pjP-value, r, from t-test representing the jth mRNA numberjP-value at m representing the jth mRNA number1The sequenced position in individual mRNAs.
Finally selecting mRNA with the multiple change f of more than 1 and the FDR corrected q value of less than or equal to 0.05, marking as characteristic mRNA, and setting the total number of the characteristic mRNA as m2Then, there are:
Figure BDA0002617840460000104
step 2, selecting characteristic mRNA expression data, and carrying out data standardization on each sample, wherein the formula is as follows:
Figure BDA0002617840460000105
where i is the sample number and j is the characteristic mRNA number. Mu.siMean, σ, of all characteristic mRNA expressions of the ith sampleiFor all characteristic mRNA standard deviations, u, of the ith sampleijFor logarithmic characteristic mRNA expression values, uij' is the normalized mRNA value.
Step 3, constructing an early prediction model for the standardized data by using a support vector machine, specifically:
and 3.1, grouping all samples. 80% of all samples were divided into training set + validation set, and the remaining 20% were divided into test set. The training set and the verification set are used for 5-fold cross validation, namely the training set and the verification set are divided into 5 groups which are equal, one group is taken as the verification set in sequence, and the other 4 groups are taken as the training set. Given the parameters, the training set is used to construct the model, and the validation set is used to verify the accuracy of the model.
And 3.2, screening the optimal parameters. The parameter gamma in the SVM controls the width of the Gaussian kernel, and C is a regularization parameter, limiting the importance of each point. The parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100](9)
C=[0.001,0.01,0.1,1,10,100](10)
in cross-validation, the model is constructed using a combination of every two parameters gamma and C in turn, and then the validation set is used to verify the model accuracy. For each parameter combination, each validation of 5-fold cross validation yields 1 precision, and a total of 5 validations yields 5 precisions. And selecting the parameter combination with the highest average accuracy of 5 times of verification as the optimal parameter.
And 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set. The evaluation index includes accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Mathematic Correlation Coefficient (MCC), and area under the subject operating curve (ROC) (AUC). In the test set, the tumor counts are defined as true positive (tp), normal but predicted tumor counts as False Positive (FP), tumor counts as normal but predicted normal as False Negative (FN), and true normal and predicted normal as True Negative (TN). The above evaluation index calculation formula is:
Figure BDA0002617840460000111
Figure BDA0002617840460000112
Figure BDA0002617840460000113
Figure BDA0002617840460000121
Figure BDA0002617840460000122
Figure BDA0002617840460000123
Figure BDA0002617840460000124
the accuracy, recall, specificity, F1 score, and AUC of the above evaluation indices returned values between (0, 1). The higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index which is the harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier. Therefore, the closer the above index is to 1, the better the prediction effect of the entire model is.
And 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect. The final prediction model is constructed with the optimal parameter combinations using all the data.
And 4, carrying out early prediction according to the expression level of the mRNA characteristic of the patient, specifically comprising the following steps:
step 4.1, standardizing the characteristic mRNA expression data of the prediction sample, setting u as the characteristic mRNA expression value of the prediction sample, setting mu as the characteristic mRNA expression mean value of the prediction sample, setting sigma as the standard deviation of the characteristic mRNA of the prediction sample, and adopting the following formula:
Figure BDA0002617840460000125
wherein j is the characteristic mRNA number, uj' is the normalized mRNA value.
And 4.2, substituting the mRNA value after the prediction sample is normalized into the final prediction for prediction. A prediction of 1 indicates a head and neck squamous cell carcinoma, and a prediction of 0 indicates normal.
Example 1
A method for early prediction of head and neck squamous cell carcinoma based on characteristic mRNA expression profile combination comprises the following steps:
step 1, obtaining mRNA (characteristic mRNA) stably and differentially expressed by a patient with early-stage head and neck squamous cell carcinoma, wherein the detailed flow is shown in a figure 1.
Step 1.1, downloading transcriptome Data and clinical Data of tumor tissues and para-carcinoma tissues of the patients with the squamous cell carcinoma of head and neck from a Genomic Data common Data Portal database, obtaining a tumor tissue gene expression profile read counts value of the patients with the squamous cell carcinoma of head and neck, and carrying out logarithmic conversion.
Step 1.2, selecting mRNA with certain expression abundance, namely the read counts of the mRNA in all samples are more than or equal to 10, and the detailed description is shown in a formula (1).
And 1.3, selecting the patients with the head and neck squamous cell carcinoma with the disease stages of I and II, wherein the patients are detailed in formulas (2) to (3), and recording the patients as the patients with the head and neck squamous cell carcinoma at the early stage.
And 1.4, selecting mRNA stably expressed in the tumor sample and the normal sample, namely mRNA with the mutation coefficient smaller than 0.1 in the tumor sample and the normal sample.
Step 1.5, mRNA differentially expressed in tumor and normal samples is selected, and see formulas (4) - (7) for details. The signature mRNA is recorded. In this example, the first 20 head and neck squamous cell carcinoma characteristic mRNAs (sorted from small to large according to the P value after FDR correction) were selected for model construction, as shown in Table 1. The nucleotide probe sequences of 20 head-neck squamous cell carcinoma characteristic mRNAs are shown in Table 2.
TABLE 1 head and neck squamous cell carcinoma characteristic mRNA
Figure BDA0002617840460000131
Figure BDA0002617840460000141
TABLE 2 nucleotide Probe sequences for head and neck squamous cell carcinoma characteristic mRNAs
Figure BDA0002617840460000142
And 2, carrying out data standardization on each sample, wherein the details are shown in a formula (8).
And 3, constructing an early diagnosis model for the standardized data by using a support vector machine.
And 3.1, grouping all samples. 80% of all samples were divided into training set + validation set, and the remaining 20% were divided into test set. The training set and the verification set are used for 5-fold cross validation, namely the training set and the verification set are divided into 5 groups which are equal, one group is taken as the verification set in sequence, and the other 4 groups are taken as the training set. Given the parameters, the training set is used to construct the model, and the validation set is used to verify the accuracy of the model. See figure 1 for details.
And 3.2, screening the optimal parameters. The SVM parameter grid is set by formulas (9) - (10). In cross-validation, the model is constructed using a combination of every two parameters gamma and C in turn, and then the model accuracy is verified with a validation set. For each parameter combination, each validation of 5-fold cross-validation yielded 1 accuracy, and a total of 5 validations yielded 5 accuracies. And selecting the parameter combination with the highest average accuracy of 5 times of verification as the optimal parameter. Fig. 2 shows the cross-validation parameter optimization process, where the model cross-validation accuracy is highest when the parameter gamma is 0.001 and the parameter C is 10: 0.972. the optimal parameters of the model are therefore: gamma is 0.001 and C is 10.
And 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set. The evaluation index includes accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Mathematic Correlation Coefficient (MCC), and area under the subject operating curve (ROC) (AUC). The evaluation index is described in detail in formulas (11) to (17).
Step 3.4, fig. 3 shows accuracy, recall, specificity, F1 score and MCC in the above evaluation indexes, all 6 indexes being 1.000; FIG. 4 shows the ROC curve and AUC, with an AUC of 1.000 in the test set. The evaluation indexes show that the model has good prediction effect. Thus, using all the data, the final prediction model is constructed with the optimal parameter combinations.
And 4, early prediction is carried out according to the expression level of the mRNA which is characterized by the patient:
and 4.1, standardizing the characteristic mRNA expression data of the prediction sample, wherein the details are shown in a formula (18). The method randomly selects 10 samples for prediction, and eliminates the 10 samples when a final prediction model is constructed. The numbers of 10 samples taken and the normalized characteristic mRNA values are shown in Table 3.
TABLE 3.10 sample numbers and values normalized for characteristic mRNA
Figure BDA0002617840460000151
Figure BDA0002617840460000161
And 4.2, substituting the mRNA value after the prediction sample is normalized into the final prediction for prediction. A prediction of 1 indicates a head and neck squamous cell carcinoma, and a prediction of 0 indicates normal. The sample numbers of 10 cases, the corresponding TCGA numbers, actual states and predicted results are shown in Table 4. The prediction results of 10 samples completely accord with the actual state, which shows that the invention can accurately predict the head and neck squamous cell carcinoma at early stage.
TABLE 4.10 sample numbers, corresponding TCGA numbers, actual and predicted states
Figure BDA0002617840460000162
In conclusion, the characteristic mRNA expression profile combination has high prediction accuracy, and can effectively predict the head and neck squamous cell carcinoma at early stage. In addition, the method has no platform dependency, and can predict data from various sources.
While the foregoing description shows and describes several preferred embodiments of the invention, it is to be understood, as noted above, that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Figure BDA0002617840460000181
Figure BDA0002617840460000191
Figure BDA0002617840460000201
Figure BDA0002617840460000211
Figure BDA0002617840460000221
SEQUENCE LISTING
<110> second people hospital of Guangdong province
<120> combination of characteristic mRNA expression profiles and early prediction method of head and neck squamous cell carcinoma
<130>2020
<160>20
<170>PatentIn version 3.3
<210>1
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>1
gggacagtaa atgtatgggg tcgcagggtg 30
<210>2
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>2
catctggagg aaatggcctt ctttttaaaa 30
<210>3
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>3
ctccctgcag tttgacttct ttgagacaga 30
<210>4
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>4
tcagagacct taaaaagaag tttactgcaa 30
<210>5
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>5
ttgtttgatg tgcacagcgt cctgcgggtg 30
<210>6
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>6
ggtgtgggaa cttctcactc attggcttct 30
<210>7
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>7
ggagaaaaag aaaataacag gaattaggac 30
<210>8
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>8
cagccaatct gtgaatgtaa aaactacact 30
<210>9
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>9
atttgctttc aaaataaata aggtcagcta 30
<210>10
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>10
tggcagcttt ggggctgttt ttgagcttct 30
<210>11
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>11
ttgtccccac tgtttaaaaa tgttacctgt 30
<210>12
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>12
ggatccaggc tacctagagg ggcatcgggc 30
<210>13
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>13
ggaccctgct cacagccttc tacatggtgc 30
<210>14
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>14
gggatgggtc tctctgtctc cccacttcct 30
<210>15
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>15
caagggtgtc tcatgctaca agaagaggca 30
<210>16
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>16
gggaaggggg aacatgagcc tttgttgcta 30
<210>17
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>17
ggctgggcac ttcttcgatg catccatcac 30
<210>18
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>18
catctatttc ctggcttata actcccaaaa 30
<210>19
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>19
aaagtatttt tgttttgttt tgtttttgcc 30
<210>20
<211>30
<212>DNA
<213> Artificial sequence (Artificial sequence)
<400>20
gaacagagac cagaaagagt aaaacctttt 30

Claims (6)

1. A combination of characteristic mRNA expression profiles comprising AC011462.1, ARHGEF10L, BMP1, CCM2, CD276, COLGALT1, DCBLD1, GPD1L, GPT2, HOMER3, MPC1, MRGBP, P3H1, PLOD3, PRADC1, SERPINH1, SLC26A6, SMDT1, SNAI2 and TPT1, the nucleotide probe sequences of which are shown in SEQ ID No. 1-20.
2. A method for the early prediction of head and neck squamous cell carcinoma based on the combination of characteristic mRNA expression profiles according to claim 1, characterized in that it comprises the following steps:
step 1, acquiring characteristic mRNA stably and differentially expressed by a patient with head and neck squamous cell carcinoma at an early stage;
step 2, selecting characteristic mRNA expression data, and carrying out data standardization on each sample;
step 3, constructing an early prediction model for the standardized data by using a support vector machine;
step 4, early prediction is carried out according to the expression level of the mRNA which is characteristic of the patient;
the method is for non-disease diagnostic and therapeutic purposes.
3. The method for the early prediction of squamous cell carcinoma of head and neck according to claim 2, wherein the step 1 of obtaining characteristic mRNA stably and differentially expressed by patients with squamous cell carcinoma of head and neck at early stage is specifically as follows:
step 1.1, downloading the transcriptome Data and clinical Data of the tumor tissue and the para-carcinoma tissue of the patient with the head and neck squamous cell carcinoma from a Genomic Data common Data Portal database to obtain a tumor tissue gene expression profile read counts value of the patient with the head and neck squamous cell carcinoma, namely a sequencing reading value, and carrying out logarithmic conversion;
step 1.2, selecting mRNA with certain expression abundance, namely, reading counts of the mRNA in all samples are more than or equal to 10; taking logarithm of read counts of all mRNA, setting the total number of samples as n, taking the total number of screened mRNA as m, v as the read counts of the mRNA, and u as an expression value after taking logarithm, and then obtaining the result;
Figure FDA0002617840450000011
wherein i is the sample number, j is the mRNA number, uijThe expression value after taking the logarithm of the ith sample and the jth mRNA number, vijRead counts values for sample i, mRNA j number;
step 1.3, selecting head and neck squamous cell carcinoma patients with disease stages of I stage and II stage, recording the patients as head and neck squamous cell carcinoma early-stage patients, and recording the total number of the head and neck squamous cell carcinoma early-stage patients as n';
step 1.4, selecting mRNA stably expressed in the tumor sample and the normal sample, namely mRNA with the variation coefficient smaller than 0.1 in the tumor sample and the normal sample, setting mu as the expression mean value of the mRNA in all samples, setting sigma as standard deviation, and calculating the variation coefficient according to the formula:
Figure FDA0002617840450000021
wherein j is the mRNA number, cvIs the coefficient of variation, cvjCoefficient of variation, σ, for the j-th samplejIs the standard deviation of the jth mRNA number, μjThe expression average of the mRNA numbered by the jth mRNA is set as m1For the total number of stably expressed mrnas, there are:
Figure FDA0002617840450000022
step 1.5, mRNA which is differentially expressed in a tumor sample and a normal sample is selected; the logarithmized expression values were used to calculate the log-oriented fold change f of the tumor and normal sample mrnas, and the formula is:
Figure FDA0002617840450000023
wherein j is the mRNA number, fjFold change for jth mRNA numbering,. mu.1jExpression mean, μ, of tumor samples numbered for the jth mRNA2jExpression mean of the normal sample numbered for the jth mRNA;
the expression difference of mRNA in tumor and normal samples was then compared using independent sample t-test, which was formulated as:
Figure FDA0002617840450000031
wherein n is1Is the number of tumor samples, n2Is a normal number of samples, mu1Mean tumor sample mRNA expression, μ2Is the mean value of the mRNA expression of a normal sample,
Figure FDA0002617840450000032
for tumor samplesThe variance of the mRNA is determined by the variance of the mRNA,
Figure FDA0002617840450000033
mRNA variance for normal samples;
correcting the p values obtained by all t tests by using a False Discovery Rate (FDR), wherein q is a value corrected by the FDR, and r is a p value in m1The sequenced positions in the individual mRNAs are:
Figure FDA0002617840450000034
wherein j is the mRNA number, qjRepresents the FDR corrected value of the jth mRNA number, pjP-value, r, from t-test representing the jth mRNA numberjP-value at m representing the jth mRNA number1The sequenced position in the individual mRNA;
finally selecting mRNA with the multiple change f of more than 1 and the FDR corrected q value of less than or equal to 0.05, marking as characteristic mRNA, and setting the total number of the characteristic mRNA as m2Then, there are:
m2=m1{|fj|≥1,qj≤0.05},j∈(1,m1) (7)。
4. the method for early prediction of squamous cell carcinoma of head and neck according to claim 2, wherein the characteristic mRNA expression data is selected in step 2, and the data is normalized for each sample according to the formula:
Figure FDA0002617840450000041
wherein i is the sample number, j is the characteristic mRNA number, μiMean, σ, of all characteristic mRNA expressions of the ith sampleiFor all characteristic mRNA standard deviations, u, of the ith sampleijFor logarithmic characteristic mRNA expression values, uij' is the normalized mRNA value.
5. The method for early prediction of squamous cell carcinoma of head and neck according to claim 2, wherein the step 3 uses a support vector machine to construct an early prediction model for the normalized data, specifically:
and 3.1, grouping all samples. Dividing 80% of all samples into a training set and a verification set, dividing the rest 20% of all samples into a test set, wherein the training set and the verification set are used for 5-fold cross verification, namely dividing the training set and the verification set into 5 equal groups, taking one group as the verification set and the rest 4 groups as the training set in sequence, giving parameters, the training set is used for constructing a model, and the verification set is used for verifying the accuracy of the model;
step 3.2, optimal parameter screening, wherein the parameter gamma in the SVM controls the width of a Gaussian kernel, and C is a regularization parameter and limits the importance of each point; the parameter grid is set as:
gamma=[0.001,0.01,0.1,1,10,100](9)
C=[0.001,0.01,0.1,1,10,100](10)
in the cross validation, a model is constructed by sequentially using the combination of every two parameters gamma and C, then the model accuracy is checked by using a validation set, for each parameter combination, each validation of 5-fold cross validation generates 1 accuracy, and 5 validations are performed totally, namely 5 accuracies are generated. Selecting a parameter combination with the highest average accuracy of 5 times of verification as an optimal parameter;
and 3.3, constructing a model by using the optimal parameters and the data of the training set and the verification set, and finally evaluating the model by using the test set, wherein evaluation indexes comprise accuracy (accuracy), accuracy (precision), recall (call), specificity (specificity), F1 score (F1 score), Matthews Correlation Coefficient (MCC) and area under the Receiver Operating Curve (ROC) (AUC). In the test set, the tumor counts are defined as True Positive (TP), normal but predicted tumor counts as False Positive (FP), tumor counts as False Negative (FN), and normal and predicted as True Negative (TN). The above evaluation index calculation formula is:
Figure FDA0002617840450000051
Figure FDA0002617840450000052
Figure FDA0002617840450000053
Figure FDA0002617840450000054
Figure FDA0002617840450000055
Figure FDA0002617840450000056
Figure FDA0002617840450000057
the accuracy, recall, specificity, F1 score and AUC of the above assessment indices returned values between (0, 1); the higher the accuracy is, the higher the overall prediction efficiency of the model is; higher accuracy indicates that the class I error is smaller; higher recall indicates that a class II error is being made smaller; the high specificity indicates that few negative examples are mixed in the samples predicted to be positive examples; the F1 score is a comprehensive index and is a harmonic average of the accuracy rate and the recall rate; MCC is the correlation coefficient between observed and predicted binary classifications, returning a value between (-1, 1), where 1 represents perfect prediction, 0 represents no better than random prediction, -1 represents a complete disparity between prediction and observation; a higher AUC indicates a higher probability of a positive instance being predicted by the classifier. Therefore, the closer the above index is to 1, the better the overall prediction effect of the model is;
step 3.4, if the evaluation indexes are all larger than 0.9, the model has a better prediction effect; the final prediction model is constructed with the optimal parameter combinations using all the data.
6. The method for early prediction of squamous cell carcinoma of head and neck according to claim 2, wherein the early prediction in step 4 is performed according to the expression level of patient characteristic mRNA, specifically:
step 4.1, standardizing the characteristic mRNA expression data of the prediction sample, setting u as the characteristic mRNA expression value of the prediction sample, setting mu as the characteristic mRNA expression mean value of the prediction sample, setting sigma as the standard deviation of the characteristic mRNA of the prediction sample, and adopting the following formula:
Figure FDA0002617840450000061
wherein j is the characteristic mRNA number, μj' is the normalized mRNA value;
step 4.2, substituting the mRNA value after the prediction sample is standardized into the final prediction for prediction; a prediction of 1 indicates a head and neck squamous cell carcinoma, and a prediction of 0 indicates normal.
CN202010775029.2A 2020-08-04 2020-08-04 Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method Withdrawn CN111876485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010775029.2A CN111876485A (en) 2020-08-04 2020-08-04 Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010775029.2A CN111876485A (en) 2020-08-04 2020-08-04 Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method

Publications (1)

Publication Number Publication Date
CN111876485A true CN111876485A (en) 2020-11-03

Family

ID=73211646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010775029.2A Withdrawn CN111876485A (en) 2020-08-04 2020-08-04 Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method

Country Status (1)

Country Link
CN (1) CN111876485A (en)

Similar Documents

Publication Publication Date Title
US10354747B1 (en) Deep learning analysis pipeline for next generation sequencing
CA2877430C (en) Systems and methods for generating biomarker signatures with integrated dual ensemble and generalized simulated annealing techniques
CN111748632A (en) Characteristic lincRNA expression profile combination and liver cancer early prediction method
CA2877429A1 (en) Systems and methods for generating biomarker signatures with integrated bias correction and class prediction
CN104508670B (en) System and method for generating biomarker signature
CN111748633A (en) Characteristic miRNA expression profile combination and head and neck squamous cell carcinoma early prediction method
CN111763738A (en) Characteristic mRNA expression profile combination and liver cancer early prediction method
CN111748634A (en) Characteristic lincRNA expression profile combination and early prediction method of colon cancer
CN111944902A (en) Early prediction method of renal papillary cell carcinoma based on lincRNA expression profile combination characteristics
CN111944900A (en) Characteristic lincRNA expression profile combination and early endometrial cancer prediction method
CN111733251A (en) Characteristic miRNA expression profile combination and early prediction method of renal clear cell carcinoma
CN106415563A (en) Systems and methods for predicting a smoking status of an individual
CN111876485A (en) Characteristic mRNA expression profile combination and head and neck squamous cell carcinoma early prediction method
CN111808965A (en) Characteristic lincRNA expression profile combination and early prediction method of renal clear cell carcinoma
CN111850124A (en) Characteristic lincRNA expression profile combination and lung squamous carcinoma early prediction method
CN111733252A (en) Characteristic miRNA expression profile combination and early gastric cancer prediction method
CN111793692A (en) Characteristic miRNA expression profile combination and lung squamous carcinoma early prediction method
WO2022139735A1 (en) Disease classification based on rna-sequencing data and an algorithm for the detection of disease-related genes
JP5307996B2 (en) Method, system and computer software program for identifying discriminant factor set
KR102376212B1 (en) Gene expression marker screening method using neural network based on gene selection algorithm
CN111944901A (en) Characteristic mRNA expression profile combination and renal papillary cell carcinoma early prediction method
CN111951883A (en) Characteristic mRNA expression profile combination and colon cancer early prediction method
CN111944898A (en) Characteristic mRNA expression profile combination and renal clear cell carcinoma early prediction method
CN111718997A (en) Characteristic mRNA expression profile combination and early gastric cancer prediction method
CN111785319A (en) Drug relocation method based on differential expression data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201103

WW01 Invention patent application withdrawn after publication