CN112485162A - Method for predicting gender by using blood marker - Google Patents

Method for predicting gender by using blood marker Download PDF

Info

Publication number
CN112485162A
CN112485162A CN202011278098.9A CN202011278098A CN112485162A CN 112485162 A CN112485162 A CN 112485162A CN 202011278098 A CN202011278098 A CN 202011278098A CN 112485162 A CN112485162 A CN 112485162A
Authority
CN
China
Prior art keywords
blood
model
data
gender
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011278098.9A
Other languages
Chinese (zh)
Inventor
罗奇斌
申玉林
廖胜光
任毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Qiyun Nord Biomedical Co ltd
Original Assignee
Tianjin Qiyun Nord Biomedical Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Qiyun Nord Biomedical Co ltd filed Critical Tianjin Qiyun Nord Biomedical Co ltd
Priority to CN202011278098.9A priority Critical patent/CN112485162A/en
Publication of CN112485162A publication Critical patent/CN112485162A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N35/00Automatic analysis not limited to methods or materials provided for in any single one of groups G01N1/00 - G01N33/00; Handling materials therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/01Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials specially adapted for biological cells, e.g. blood cells

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Hematology (AREA)
  • Molecular Biology (AREA)
  • Urology & Nephrology (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Dispersion Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Cell Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biotechnology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to the field of biological information, and discloses a method for predicting gender by using a blood marker. Compared with other methods and markers, the selected blood marker, namely the blood biochemical index, is the most common index in clinical and physical examination institutions, is low in acquisition difficulty and low in cost, and meanwhile, the machine learning model is used for calculating the selected blood marker, so that the process is more intelligent, and the result is more accurate.

Description

Method for predicting gender by using blood marker
Technical Field
The invention relates to the field of biological information, in particular to a method for predicting gender by using a blood marker.
Background
Gender refers to the difference between male and female sexes. On the chromosome level, human has 22 pairs of autosomes and a pair of sex chromosomes, the sex chromosome of male is XY, the sex chromosome of female is XX, the existence of Y chromosome is also one of the methods for judging the sex of human; on the gene level, the SRY testis determining gene is usually used as the basis for sex determination, and the SRY gene is located on the Y chromosome, so that individuals with the SRY gene are male and individuals without the SRY gene are female; on a more macroscopic level, the gonads, the genitals and the like can be used as the basis for sex classification.
Domestic and foreign research shows that some blood indexes are different among men and women, and the different indexes are applied to clinic. There is a significant difference in the red blood cell counts of male and female adult males and females, which is not apparent during the neonatal and infant stages, but does not appear to be significant until puberty and after adulthood. In China, the number of red blood cells in adult male is 400-550 ten thousand/mu l, the number of red blood cells in female is 350-500 ten thousand/mu l, but the number of red blood cells in female body gradually increases after the female ages of 40 years, and the level of red blood cells is close to that of male. According to medical research, the difference is related to the maturity of the gonadal function. During adolescence, the level of male androgen begins to rise in the male, and the androgen can stimulate the increase of red blood cells through two actions, on one hand, the male hormone can directly act on the hematopoietic tissue of bone marrow to stimulate the hematopoietic tissue of bone marrow to accelerate the division activity of nucleated red blood cells and the synthesis of hemoglobin, on the other hand, the male hormone can stimulate the kidney to produce an enzyme erythropoiesis enzyme which can convert erythropoietins produced by the liver into erythropoietin, and the latter can stimulate primary blood cells in the bone marrow to accelerate the differentiation to form primary red blood cells and promote the mitotic process of the nucleated red blood cells to accelerate the maturation process. It also has promoting effect on hemoglobin biosynthesis, which is the main component of erythrocytes. In addition, erythropoietin promotes the release of mature red blood cells in the bone marrow into the surrounding blood. For the above reasons, at the beginning of puberty, androgen levels in males are significantly higher than in females, while estrogen has no such function of androgen, resulting in differences in red blood cells in males and females of adults. The normal reference value of the hemoglobin (Hb) concentration of an adult male is about 135-180 g/L, the hemoglobin concentration of a female is about 115-155 g/L, the corresponding hemoglobin concentration is gradually increased with the age, and the difference of the hemoglobin in the two sexes is also related to the regulation of male hormone. Meanwhile, the regulation of sex hormone can also cause the regulation of glucose balance in individuals of different sexes, so that the differences of blood sugar, glucose tolerance and the like also exist in the individuals of both sexes.
Under the condition that no individual entity exists and only an individual related sample exists, the sex of the individual cannot be judged on a macroscopic level, the sex of the individual can be judged only by acquiring chromosome or gene information of the individual on a chromosome and gene level, and the common method can amplify DNA purified in the sample in a PCR mode to acquire the chromosome and gene information.
The existing method for judging or predicting the sex of the individual needs the participation of an individual entity or uses DNA information, but the DNA information is difficult to obtain and high in cost, so that the method for predicting the sex by using the blood marker is designed, the blood marker which is more common and easier to obtain in clinical use is used as a characteristic value for predicting the sex of the individual, the cost and the difficulty of sex judgment are reduced, and the method has practical significance and good application prospect.
Disclosure of Invention
In view of the above-mentioned shortcomings in the background art, the present invention provides a method for predicting gender using blood markers, which uses more easily obtained blood markers and reduces the technical cost.
In order to achieve the purpose, the invention provides the following technical scheme: a method of predicting gender using blood markers, comprising the steps of:
the method comprises the steps of firstly, collecting blood marker data, wherein the QIN Yun Nud obtains the blood marker data of 92062 samples from a plurality of relevant databases in a total way, each sample comprises individual sex and 19 blood marker data, the blood marker data is a blood biochemical index, and a blood routine and blood biochemical index detection report form commonly found in hospitals and physical examination institutions;
secondly, preprocessing data, removing samples with missing data and samples with obvious error outliers (outlear), obtaining 26754 cases of complete samples for training and testing of the model in total, then standardizing 19 blood marker data, and mapping the numerical values of all the marker data in the range of [0,1 ];
thirdly, establishing and evaluating a model, randomly dividing the preprocessed data into a training set and a testing set according to the proportion of 7:3 for training the model, training 19 blood marker data of 26754 samples by using a Deep Neural Network (DNN) machine learning algorithm, adjusting model parameters such as the number of hidden layers, the number of neurons and the number of dropouts, and training a plurality of gender prediction models;
and fourthly, testing the model, namely randomly acquiring 30% of data from 26754 sample data to input the data into the model to predict the gender, verifying the data in the model, performing corresponding verification test on each model, and finally selecting the model with the best calculation efficacy as the gender prediction model.
Preferably, the 19 blood markers include albumin, glucose, urea, cholesterol, total protein, serum sodium, creatinine, hemoglobin, total bilirubin, triglycerides, high density lipoprotein cholesterol, low density lipoprotein cholesterol, serum calcium, serum potassium, hematocrit, mean red blood cell hemoglobin concentration, mean red blood cell volume, platelet count and red blood cell count.
Preferably, the gender prediction uses a Deep Neural Network (DNN) classification algorithm.
Preferably, the gender prediction model established according to the DNN algorithm uses 19 blood markers as main features to predict the gender of the sample.
Compared with the prior art, the invention has the following beneficial effects:
1. the blood marker is used as the characteristic value of the sex, compared with other method technologies, the cost is lower, and because the selected 19 blood markers are the most common indexes in clinical and physical examination institutions and are commonly found in blood routine and blood biochemical detection report sheets, the acquisition difficulty is low, and the cost is low;
2. the gender prediction model trained by the DNN algorithm is a machine learning model, the calculation degree is higher than that of a conventional method, but the calculation difficulty is reduced, and each parameter and the model structure in the calculation model are verified for multiple times, so that the method has higher accuracy and lower use difficulty when the gender of an individual is predicted.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 shows the statistical results of model performance.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a method for predicting gender using blood markers is a method for predicting gender of a subject sample using common blood markers, based on the characteristics of 19 blood markers significantly related to gender obtained by statistical test, combining blood marker data and phenotype data of 92062 samples collected by companies, using a Deep Neural Network (DNN) algorithm to construct a machine learning model, and performing internal test, the model established by the DNN algorithm is verified to be significantly higher in result accuracy than models established by other machine learning algorithms (k-nearest neighbor algorithm, random forest, support vector machine, etc.), and meanwhile, a hidden layer and neurons introduced by the DNN algorithm enhance the expression ability of the model, and the characteristics of the DNN algorithm in the aspect of automatically scaling neuron weight also enrich the development direction of the model to the greatest extent. The gender prediction method comprises four steps of blood marker data collection, data preprocessing, model establishment and evaluation and model testing, wherein the blood marker predicts the gender according to the following method:
the method comprises the steps of firstly, collecting blood marker data, wherein the QIN Yun Nud obtains the blood marker data of 92062 samples from a plurality of relevant databases in a total way, each sample comprises individual sex and 19 blood marker data, the blood marker data is a blood biochemical index, and a blood routine and blood biochemical index detection report form commonly found in hospitals and physical examination institutions;
secondly, preprocessing data, removing samples with missing data and samples with obvious error outliers (outlear), obtaining 26754 cases of complete samples for training and testing of the model in total, then standardizing 19 blood marker data, and mapping the numerical values of all the marker data in the range of [0,1 ];
thirdly, establishing and evaluating a model, randomly dividing the preprocessed data into a training set and a testing set according to the proportion of 7:3 for training the model, training 19 blood marker data of 26754 samples by using a Deep Neural Network (DNN) machine learning algorithm, adjusting model parameters such as the number of hidden layers, the number of neurons and the number of dropouts, and training a plurality of gender prediction models;
and fourthly, testing the model, namely randomly acquiring 30% of data from 26754 sample data to input the data into the model to predict the gender, verifying the data in the model, performing corresponding verification test on each model, and finally selecting the model with the best calculation efficacy as the gender prediction model.
Wherein the 19 blood markers include albumin, glucose, urea, cholesterol, total protein, serum sodium, creatinine, hemoglobin, total bilirubin, triglycerides, high density lipoprotein cholesterol, low density lipoprotein cholesterol, serum calcium, serum potassium, hematocrit, mean red blood cell hemoglobin concentration, mean red blood cell volume, platelet count, and red blood cell count.
Wherein the gender prediction uses a Deep Neural Network (DNN) classification algorithm.
Wherein the gender prediction model established according to the DNN algorithm uses 19 blood markers as main features to predict the gender of the sample.
The test verification parameters of the gender prediction method comprise cross entropy and accuracy, 4 gender prediction models with different structures are constructed in total by setting different neuron numbers, hidden layer numbers, activation functions, Dropout numbers and the like, the statistical result of model performance refers to fig. 2, finally, the model with the best performance is selected as the gender prediction model, and the optimal model has the following performance: cross entropy =0.1453, accuracy = 0.9697.
When the model is used, only 19 blood marker data of a sample to be detected need to be transmitted into the model, and the sex of the sample to be detected is output after calculation.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (4)

1. A method of predicting gender using blood markers, comprising: the method comprises four steps of blood marker data collection, data preprocessing, model establishment and evaluation and model testing, wherein the blood marker predicts the sex according to the following method:
the method comprises the steps of firstly, collecting blood marker data, wherein the QIN Yun Nud obtains the blood marker data of 92062 samples from a plurality of relevant databases in a total way, each sample comprises individual sex and 19 blood marker data, the blood marker data is a blood biochemical index, and a blood routine and blood biochemical index detection report form commonly found in hospitals and physical examination institutions;
secondly, preprocessing data, removing samples with missing data and samples with obvious error outliers (outlear), obtaining 26754 cases of complete samples for training and testing of the model in total, then standardizing 19 blood marker data, and mapping the numerical values of all the marker data in the range of [0,1 ];
thirdly, establishing and evaluating a model, randomly dividing the preprocessed data into a training set and a testing set according to the proportion of 7:3 for training the model, training 19 blood marker data of 26754 samples by using a Deep Neural Network (DNN) machine learning algorithm, adjusting model parameters such as the number of hidden layers, the number of neurons and the number of dropouts, and training a plurality of gender prediction models;
and fourthly, testing the model, namely randomly acquiring 30% of data from 26754 sample data to input the data into the model to predict the gender, verifying the data in the model, performing corresponding verification test on each model, and finally selecting the model with the best calculation efficacy as the gender prediction model.
2. The method of claim 1, wherein the step of using the blood markers to predict gender comprises: the 19 blood markers include albumin, glucose, urea, cholesterol, total protein, serum sodium, creatinine, hemoglobin, total bilirubin, triglycerides, high density lipoprotein cholesterol, low density lipoprotein cholesterol, serum calcium, serum potassium, hematocrit, mean red blood cell hemoglobin concentration, mean red blood cell volume, platelet count and red blood cell count.
3. The method of claim 1, wherein the step of using the blood markers to predict gender comprises: the gender prediction uses a Deep Neural Network (DNN) classification algorithm.
4. The method of claim 1, wherein the step of using the blood markers to predict gender comprises: the gender prediction model built according to the DNN algorithm uses 19 blood markers as main features to predict the gender of the sample.
CN202011278098.9A 2020-11-16 2020-11-16 Method for predicting gender by using blood marker Withdrawn CN112485162A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011278098.9A CN112485162A (en) 2020-11-16 2020-11-16 Method for predicting gender by using blood marker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011278098.9A CN112485162A (en) 2020-11-16 2020-11-16 Method for predicting gender by using blood marker

Publications (1)

Publication Number Publication Date
CN112485162A true CN112485162A (en) 2021-03-12

Family

ID=74930509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011278098.9A Withdrawn CN112485162A (en) 2020-11-16 2020-11-16 Method for predicting gender by using blood marker

Country Status (1)

Country Link
CN (1) CN112485162A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223714A (en) * 2021-05-11 2021-08-06 吉林大学 Gene combination for predicting preeclampsia risk, preeclampsia risk prediction model and construction method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223714A (en) * 2021-05-11 2021-08-06 吉林大学 Gene combination for predicting preeclampsia risk, preeclampsia risk prediction model and construction method thereof
CN113223714B (en) * 2021-05-11 2022-07-05 吉林大学 Gene combination for predicting preeclampsia risk, preeclampsia risk prediction model and construction method thereof

Similar Documents

Publication Publication Date Title
CN109378072A (en) A kind of abnormal fasting blood sugar method for early warning based on integrated study Fusion Model
Martin et al. A twin-pronged attack on complex traits
CN109308545A (en) The method, apparatus, computer equipment and storage medium of diabetes probability are suffered from prediction
CN113223714B (en) Gene combination for predicting preeclampsia risk, preeclampsia risk prediction model and construction method thereof
CN113053535B (en) Medical information prediction system and medical information prediction method
CN112466402A (en) Method for predicting age by using blood marker
CN102930163A (en) Method for judging 2 type diabetes mellitus risk state
Cloninger et al. Genetic heterogeneity in alcoholism and sociopathy
CN114023449A (en) Diabetes risk early warning method and system based on depth self-encoder
CN114220540A (en) Construction method and application of diabetic nephropathy risk prediction model
CN115810394A (en) VTE risk assessment model based on polygenic mutation, construction method and application
CN109585011A (en) The Illnesses Diagnoses method and machine readable storage medium of chest pain patients
CN114974585A (en) Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period
CN112485162A (en) Method for predicting gender by using blood marker
CN111175480A (en) Method for calculating gender and age by blood biochemical indexes
CN111883258A (en) Method for constructing OHSS (OHSS) indexing type prediction model
CN111445991A (en) Method for clinical immune monitoring based on cell transcriptome data
CN113782186A (en) System for assisting in diagnosing asthenia
CN111816307A (en) Method for constructing Chinese population biological age evaluation model based on clinical marker and evaluation method
CN113035352B (en) Diabetic retinopathy early warning method based on BP neural network
Hadaegh et al. The metabolic syndrome and incident diabetes: Assessment of alternative definitions of the metabolic syndrome in an Iranian urban population
Wang et al. Expanded feature space-based gradient boosting ensemble learning for risk prediction of type 2 diabetes complications
CN110890131A (en) Method for predicting cancer risk based on hereditary gene mutation
CN109545377A (en) Obtain method for building up and the application of the model of glomerular filtration rate
CN114974570A (en) Machine learning-based old people nutrition health state assessment and risk prediction system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210312