CN110097920A - A kind of metabolism group shortage of data value fill method based on neighbour's stability - Google Patents
A kind of metabolism group shortage of data value fill method based on neighbour's stability Download PDFInfo
- Publication number
- CN110097920A CN110097920A CN201910284004.XA CN201910284004A CN110097920A CN 110097920 A CN110097920 A CN 110097920A CN 201910284004 A CN201910284004 A CN 201910284004A CN 110097920 A CN110097920 A CN 110097920A
- Authority
- CN
- China
- Prior art keywords
- sample
- metabolin
- content
- missing
- neighbour
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004060 metabolic process Effects 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 23
- DPJRMOMPQZCRJU-UHFFFAOYSA-M thiamine hydrochloride Chemical compound Cl.[Cl-].CC1=C(CCO)SC=[N+]1CC1=CN=C(C)N=C1N DPJRMOMPQZCRJU-UHFFFAOYSA-M 0.000 claims abstract description 78
- 239000000523 sample Substances 0.000 claims description 122
- 238000012217 deletion Methods 0.000 claims description 15
- 230000037430 deletion Effects 0.000 claims description 14
- 239000002207 metabolite Substances 0.000 claims description 10
- 230000002596 correlated effect Effects 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000004949 mass spectrometry Methods 0.000 claims description 4
- 239000012472 biological sample Substances 0.000 claims description 2
- 230000005484 gravity Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000007405 data analysis Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 3
- 230000002503 metabolic effect Effects 0.000 abstract description 3
- HUTDUHSNJYTCAR-UHFFFAOYSA-N ancymidol Chemical compound C1=CC(OC)=CC=C1C(O)(C=1C=NC=NC=1)C1CC1 HUTDUHSNJYTCAR-UHFFFAOYSA-N 0.000 description 4
- HSINOMROUCMIEA-FGVHQWLLSA-N (2s,4r)-4-[(3r,5s,6r,7r,8s,9s,10s,13r,14s,17r)-6-ethyl-3,7-dihydroxy-10,13-dimethyl-2,3,4,5,6,7,8,9,11,12,14,15,16,17-tetradecahydro-1h-cyclopenta[a]phenanthren-17-yl]-2-methylpentanoic acid Chemical compound C([C@@]12C)C[C@@H](O)C[C@H]1[C@@H](CC)[C@@H](O)[C@@H]1[C@@H]2CC[C@]2(C)[C@@H]([C@H](C)C[C@H](C)C(O)=O)CC[C@H]21 HSINOMROUCMIEA-FGVHQWLLSA-N 0.000 description 2
- 239000003613 bile acid Substances 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 231100000915 pathological change Toxicity 0.000 description 1
- 230000036285 pathological change Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Physiology (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Electrochemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The present invention provides a kind of metabolism group shortage of data value fill method based on neighbour's stability, belongs to metabolism group data analysis technique field.The core technology of this method is that the stability for the k nearest samples content on corresponding metabolin of sample for measuring the metabolin containing missing is respectively adopted different strategies to different types of missing values and is filled based on stable neighbour's sample.The present invention is preferable to the metabolism group data filling effect containing missing values, and to subsequent data analysis, metabolic markers selection etc. is of great significance.
Description
Technical field
The invention belongs to metabolism group data analysis technique fields, are related to a kind of metabolism group number based on neighbour's stability
It is a kind of deletion type for considering metabolin missing values according to Missing Data Filling method, similarity relation and neighbour between sample
The metabolism group shortage of data value fill method of Almost Sure Sample Stability.
Background technique
Metabolism group by carrying out the qualitative of system to the intracorporal molecule metabolites of biology and determine quantifier elimination, finding and
Physiological and pathological changes relevant metabolin.Carrying out qualitative and quantitative method to different metabolins includes mass spectrometry and core
Magnetic resonance spectrum etc..In general, by mass spectrometry obtain metabolism group data in there are many missing values.These missing values are main
From two aspects: first is that the random error being introduced into data acquisition or in instrumentation leads to certain metabolism in sample
Object content is not detected among out, and this shortage of data type is referred to as missing at random;Second is that the content of metabolin in the sample
Detection lower than mass spectrometer is limited without being detected, and this shortage of data type is referred to as Missing.Example
Such as, metabolin bile acid in human body concentration variation very greatly, due to the presence of instrument detection limit, in the metabolism group data of acquisition
Bile acid biosynthesis object may be missing values in many samples.However, conventional data analysing method is only applicable to processing completely
The data matrix without missing values.If directly by metabolism group data containing missing values metabolin or sample leave out,
Many valuable information can be lost.It therefore is that metabolism group data are analyzed using simple and highly efficient method filling missing data
In an important task, to subsequent data analysis, metabolic markers selection etc. is of great significance for this.
Some metabolism group shortage of data value processing methods use zero, the minimum value of metabolite content, the one of minimum value
The fillings such as half or median correspond to the missing values of metabolin.These methods are relatively simple, but are easy to produce subsequent data analysis
Raw larger impact.Missing Data Filling algorithm based on k arest neighbors is the common a kind of side of missing values in processing metabolism group data
Method.This method thinks that similitude is bigger between sample, and content deviation is smaller between their metabolin.If the metabolism of sample s
The content of object m lacks, based on the Missing Data Filling algorithm of k arest neighbors according to k arest neighbors of similarity measurement searching and sample s
Sample (if k nearest samples correspond to the content missing of metabolin m, is substituted) with subsequent neighbour, then recently using k
The weighted average of the content of the metabolin m of adjacent sample come fill sample s missing metabolin m content.Based on k arest neighbors
Missing Data Filling algorithm can preferably handle the data of missing at random type in metabolism group data, but to Missing class
Type data filling effect is not ideal enough.
Method proposes a kind of metabolism group shortage of data value fill method based on neighbour's stability.This method according to
Euclidean distance between sample determines k nearest samples of the sample of the metabolin containing missing, evaluates the stability of neighbour's sample,
Different types of missing values are filled using corresponding strategy based on stable neighbour's sample.
Summary of the invention
The purpose of the present invention is the missing values in filling metabolism group data.The core technology of this method is measurement containing missing
The stability of k nearest samples content on corresponding metabolin of the sample of metabolin, based on stable neighbour's sample, to not
The missing values of same type are respectively adopted different strategies and are filled.
In order to achieve the above objectives, The technical solution adopted by the invention is as follows:
A kind of metabolism group shortage of data value fill method based on neighbour's stability, steps are as follows:
Using the Metabolite in mass spectrometry detection biological sample, and the spectrum data of Metabolite is obtained, used
The pretreatment operations such as peak identification, peak match, normalization analyze spectrum data, and determine metabolite content in sample, obtain
Obtain metabolism group data.
The quantity of sample in metabolism group data is indicated with n, p indicates the quantity of metabolin in sample, xi=(xi1,
xi2,…,xip) indicate the value vector that the content of p metabolin in i-th of sample forms, 1≤i≤n.When metabolism group data
Middle sample xi(the x that the content of middle metabolin m is missing fromimFor missing values), 1≤m≤p, then by following steps to missing values xim
It is filled:
(1) sample x is calculatediWith sample xjEuclidean distance d (the x of (1≤i ≠ j≤n)i,xj), formula is as follows:
Wherein oilIndicate sample xiThe content of l (1≤l≤p) a metabolin whether lack, as sample xiFirst
When the content missing of metabolin, oil=0, otherwise oil=1.It indicates in sample xi
With sample xjThe metabolin quantity that middle content does not lack.Distance d (xi,xj) smaller, xiWith xjBetween similarity it is higher.Pass through
Euclidean distance determination and sample xiK most like sample constitutes sample set Sk;
(2) judge the deletion type of metabolin.
Calculate the Pearson correlation coefficients between metabolin m and other metabolins.It finds out and the strongest metabolin of m correlation
Reference metabolin of the aux_m as m.According to the content distribution situation of reference metabolin aux_m, the missing class of metabolin m is judged
Type, deterministic process are as follows:
Enable Smiss={ xj|xjmFor missing values, 1≤j≤n } indicate metabolism group data in metabolin m be missing values sample
This set.Enable Sobs={ xj|xjmIt is not missing values, 1≤j≤n } indicate that metabolin m is not missing values in metabolism group data
Sample set.It calculates separately with reference to metabolin aux_m in sample set SmissAnd SobsOn average content, be denoted as μmissAnd μobs。
When metabolin m and aux_m are positively correlated and μmiss< μobsWhen, then the deletion type of m is Missing, enters step (3);
Conversely, the deletion type of m is missing at random, (4) are entered step.When metabolin m and aux_m is negatively correlated and μmiss> μobsWhen,
Then the deletion type of m is Missing, enters step (3);Conversely, the deletion type of m is missing at random, (4) are entered step.
(3) Missing type processing mode.
Work as SkIt is middle there are the content of the metabolin m of sample missing when, using metabolin m in metabolism group data all samples
Minimum content value in sheet temporarily fills SkThe content value of the m of middle sample missing.The step considers wraps in metabolism group data
The situation of the data containing Missing.The appearance of Missing value be because metabolite content lower than instrument detection limit and not
It is detected.More meet nonrandom lack with the content value that the minimum content value of metabolin m temporarily fills the m of neighbour's sample missing
The characteristics of losing data.
(4) missing at random type processing mode.
Work as SkWhen the middle missing there are the content of the metabolin m of sample, then using metabolin m in SkWhat middle content did not lack
The average content of sample metabolin m, temporarily fills SkThe content value of the m of middle sample missing.Work as SkThe metabolin m's of middle sample contains
When amount lacks, then temporarily being filled using minimum content value of the metabolin m in metabolism group data on remaining all sample
SkThe content value of the m of the missing of middle sample.
(5) stable neighbour's sample is determined.
According to SkThe content degree of fluctuation of the metabolin m of middle sample determines SkMiddle stable neighbour's sample.Calculate SkMiddle sample
The average value mu and standard deviation sigma of metabolin m content.Work as SkIt is middle there are the content of the metabolin m of sample [μ-σ, μ+σ] range it
Outside, then by sample from SkIn leave out, finally obtain stable neighbour's sample set S 'k.Because of metabolite content between neighbour's sample
Deviation fluctuate very little, the sample except [μ-σ, μ+σ] range is left out into the influence that can reduce exceptional value, to guarantee Filling power
Stability and reliability when calculating.
(6) S ' is calculatedkThe weighted average of middle sample metabolin m content, the x calculated using formula (3)imFill sample xi
Missing metabolin m content.Formula is as follows:
Wherein, k '=| S 'k| indicate sample set S 'kThe quantity of middle sample, sj,sl(1≤j, l≤k ') is S 'kIn sample,
w(xi,sj) indicate sample sjMetabolin m content calculate ximWhen the weight that accounts for.d(xi,sj) indicate to be calculated by formula (1)
Sample xiWith sjEuclidean distance, slmIndicate sample slMetabolin m content.According to neighbour's sample and sample xiDistance is big
The content of the small m to different neighbour's samples assigns different weights.S′kMiddle sample and sample xiApart from smaller, its metabolin m
Content weight it is bigger, to calculate ximThe specific gravity accounted for is bigger.
Beneficial effects of the present invention:
The present invention is for filling metabolism group missing data, it is contemplated that the missing Value Types of metabolin are lacked for different
It loses Value Types and missing values is filled using different strategies;Neighbour's sample is screened simultaneously, filters unstable neighbour's sample
This.The present invention is preferable to the metabolism group data filling effect containing missing values, to subsequent data analysis, metabolic markers selection
Etc. being of great significance.
Specific embodiment
The specific embodiment of this method is further illustrated in analogue data below with reference to technical solution, analogue data is only
Be limited to illustrate the present invention in order to understand, rather than limitation of the present invention.
It is analogue data of the invention, x in table 1iIndicate i-th of sample, data include 10 samples, m1~m5Indicate number
5 metabolins in, NaN indicate the missing values in data.
Table 1: analogue data
Contain 4 missing values in 1 data of table, is x respectively13,x52,x84,x93.Below with x13For illustrate.
(1) sample x is calculated using formula (1)1The distance between other samples d, obtains: d (x1,x2)=1.94, d
(x1,x3)=1.73, d (x1,x4)=3.39, d (x1,x5)=3.46, d (x1,x6)=4.12, d (x1,x7)=2.29, d (x1,
x8)=2.71, d (x1,x9)=2.74, d (x1,x10)=3.16.K=6 is enabled, then with sample x1Most like 6 samples composition
Collection be combined into Sk={ x3,x2,x7,x8,x9,x10}。
(2) judge metabolin m3Deletion type.Calculate m3With m1, m2, m4, m5Pearson correlation coefficients.It is computed, m4
With m3Correlation is most strong, and is positively correlated, then chooses m4For m3Reference metabolin.Metabolin m in data3For the sample of missing values
This set Smiss={ x1,x9},m3The sample set not lacked is Sobs={ x2,x3,x4,x5,x6,x7,x8,x10}.With reference to metabolism
Object m4In SmissOn average value mumissIt is 7, in SobsOn average value muobsIt is 4.86.μmiss≥μobs, then deletion type be with
Machine missing, enters step (4).
(3) in x16 nearest samples in, sample x9Metabolin m3For missing values, then using x3,x2,x7,x8,
x10Metabolin m3Average value 6 temporarily fill x93Value.
(4) sample set SkThe m of middle sample3Corresponding value is { 3,9,5,7,6,6 }, SkThe m of middle sample3Mean μ=6, mark
Quasi- difference σ=2.So stable region is [4,8].Value x33,x23Except stable region, so by sample x3,x2From SkMiddle deletion,
So S 'k={ x7,x8,x9,x10}。
(5) D ' is calculatedkThe weight of middle sample.Using formula (2), S ' is obtainedkIn each sample weight are as follows: w (x1,x7)
=0.29, w (x1,x8)=0.25, w (x1,x9)=0.25, w (x1,x10)=0.21.Using formula (3), weighted average is calculated
x13=w (x1,x7)*x73+w(x1,x8)*x83+w(x1,x9)*x93+w(x1,x10)*x10,3=5.95.So by 5.95 as missing
Value x13Estimation Filling power.
To missing values x52,x84,x93Step (1)-(6) are respectively adopted to be filled.
Claims (1)
1. a kind of metabolism group shortage of data value fill method based on neighbour's stability, which is characterized in that steps are as follows:
Using the Metabolite in mass spectrometry detection biological sample, and the spectrum data of Metabolite is obtained, is known using peak
Not, peak match, normalization pretreatment operation analyze spectrum data, and determine metabolite content in sample, are metabolized
Group learns data;
The quantity of sample in metabolism group data is indicated with n, p indicates the quantity of metabolin in sample, xi=(xi1,xi2,…,xip)
Indicate the value vector of the content composition of p metabolin in i-th of sample, 1≤i≤n;As sample x in metabolism group dataiIn
What the content of metabolin m was missing from, i.e. ximFor missing values, 1≤m≤p, then by following steps to missing values ximIt is filled:
(1) sample x is calculatediWith other sample xjEuclidean distance d (xi,xj), 1≤i ≠ j≤n, formula is as follows:
Wherein, oilIndicate sample xiThe content of first of metabolin whether lack, 1≤l≤p, as sample xiFirst of metabolin
Content missing when, oil=0, otherwise oil=1;It indicates in sample xiWith sample
xjThe metabolin quantity that middle content does not lack;Distance d (xi,xj) smaller, xiWith xjBetween similarity it is higher;By it is European away from
From determining and sample xiK most like sample constitutes sample set Sk;
(2) judge the deletion type of metabolin
Calculate the Pearson correlation coefficients between metabolin m and other metabolins;It finds out and the strongest metabolin aux_ of m correlation
Reference metabolin of the m as m;According to the content distribution situation of reference metabolin aux_m, judges the deletion type of metabolin m, sentence
Disconnected process is as follows:
Enable Smiss={ xj|xjmFor missing values, 1≤j≤n } indicate metabolism group data in metabolin m be missing values sample set
It closes;Enable Sobs={ xj|xjmNot be missing values, 1≤j≤n } indicate metabolism group data in metabolin m be not missing values sample
Set;It calculates separately with reference to metabolin aux_m in sample set SmissAnd SobsOn average content, be denoted as μmissAnd μobs;The present age
It thanks to object m and aux_m is positively correlated and μmiss< μobsWhen, then the deletion type of m is Missing, enters step (3);Conversely,
The deletion type of m is missing at random, enters step (4);When metabolin m and aux_m is negatively correlated and μmiss> μobsWhen, then m
Deletion type is Missing, enters step (3);Conversely, the deletion type of m is missing at random, (4) are entered step;
(3) Missing type processing mode
Work as SkWhen the middle missing there are the content of the metabolin m of sample, using metabolin m in metabolism group data on all samples
Minimum content value temporarily fill SkThe content value of the m of middle sample missing;
(4) missing at random type processing mode
Work as SkWhen the middle missing there are the content of the metabolin m of sample, then using metabolin m in SkThe sample that middle content does not lack
The average content of metabolin m, temporarily fills SkThe content value of the m of middle sample missing;Work as SkThe content of the metabolin m of middle sample is equal
When missing, then temporarily filling S using minimum content value of the metabolin m in metabolism group data on remaining all samplekIn
The content value of the m of the missing of sample;
(5) stable neighbour's sample is determined
According to SkThe content degree of fluctuation of the metabolin m of middle sample determines SkMiddle stable neighbour's sample;Calculate SkMiddle sample metabolism
The average value mu and standard deviation sigma of object m content;Work as SkMiddle there are the contents of the metabolin m of sample except [μ-σ, μ+σ] range, then
By sample from SkIn leave out, finally obtain stable neighbour's sample set S 'k;
(6) S ' is calculatedkThe weighted average of middle sample metabolin m content;The x calculated using formula (3)imFill sample xiLack
The content of metabolin m is lost, formula is as follows:
Wherein, k '=| S 'k| indicate sample set S 'kThe quantity of middle sample, sj,sl(1≤j, l≤k ') is S 'kIn sample, w
(xi,sj) indicate sample sjMetabolin m content calculate ximWhen the weight that accounts for;d(xi,sj) indicate to be calculated by formula (1)
Sample xiWith sjEuclidean distance, slmIndicate sample slMetabolin m content;According to neighbour's sample and sample xiApart from size
Different weights is assigned to the content of the m of different neighbour's samples;S′kMiddle sample and sample xiApart from smaller, its metabolin m's
Content weight is bigger, to calculating ximThe specific gravity accounted for is bigger.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284004.XA CN110097920B (en) | 2019-04-10 | 2019-04-10 | Metabonomics data missing value filling method based on neighbor stability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910284004.XA CN110097920B (en) | 2019-04-10 | 2019-04-10 | Metabonomics data missing value filling method based on neighbor stability |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110097920A true CN110097920A (en) | 2019-08-06 |
CN110097920B CN110097920B (en) | 2022-09-20 |
Family
ID=67444595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910284004.XA Expired - Fee Related CN110097920B (en) | 2019-04-10 | 2019-04-10 | Metabonomics data missing value filling method based on neighbor stability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110097920B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737463A (en) * | 2020-06-04 | 2020-10-02 | 江苏名通信息科技有限公司 | Big data missing value filling method, device and computer program |
CN111859275A (en) * | 2020-07-20 | 2020-10-30 | 厦门大学 | Mass spectrum data missing value filling method and system based on non-negative matrix factorization |
CN113485986A (en) * | 2021-06-25 | 2021-10-08 | 国网江苏省电力有限公司信息通信分公司 | Electric power data restoration method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
WO2015004502A1 (en) * | 2013-07-09 | 2015-01-15 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | Method for imputing corrupted data based on localizing anomalous parts |
CN104298893A (en) * | 2014-09-30 | 2015-01-21 | 西南交通大学 | Imputation method of genetic expression deletion data |
CN105424827A (en) * | 2015-11-07 | 2016-03-23 | 大连理工大学 | Screening and calibrating method of metabolomic data random errors |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN106777938A (en) * | 2016-12-06 | 2017-05-31 | 合肥工业大学 | A kind of microarray missing value estimation method based on adaptive weighting |
CN107193876A (en) * | 2017-04-21 | 2017-09-22 | 美林数据技术股份有限公司 | A kind of missing data complementing method based on arest neighbors KNN algorithms |
CN108256538A (en) * | 2016-12-28 | 2018-07-06 | 北京酷我科技有限公司 | A kind of subscriber data Forecasting Methodology and system |
CN108563770A (en) * | 2018-04-20 | 2018-09-21 | 南京邮电大学 | A kind of KPI and various dimensions network data cleaning method based on scene |
CN109472343A (en) * | 2018-10-16 | 2019-03-15 | 上海电机学院 | A kind of improvement sample data missing values based on GKNN fill up algorithm |
-
2019
- 2019-04-10 CN CN201910284004.XA patent/CN110097920B/en not_active Expired - Fee Related
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
WO2015004502A1 (en) * | 2013-07-09 | 2015-01-15 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | Method for imputing corrupted data based on localizing anomalous parts |
CN104298893A (en) * | 2014-09-30 | 2015-01-21 | 西南交通大学 | Imputation method of genetic expression deletion data |
CN105424827A (en) * | 2015-11-07 | 2016-03-23 | 大连理工大学 | Screening and calibrating method of metabolomic data random errors |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN106777938A (en) * | 2016-12-06 | 2017-05-31 | 合肥工业大学 | A kind of microarray missing value estimation method based on adaptive weighting |
CN108256538A (en) * | 2016-12-28 | 2018-07-06 | 北京酷我科技有限公司 | A kind of subscriber data Forecasting Methodology and system |
CN107193876A (en) * | 2017-04-21 | 2017-09-22 | 美林数据技术股份有限公司 | A kind of missing data complementing method based on arest neighbors KNN algorithms |
CN108563770A (en) * | 2018-04-20 | 2018-09-21 | 南京邮电大学 | A kind of KPI and various dimensions network data cleaning method based on scene |
CN109472343A (en) * | 2018-10-16 | 2019-03-15 | 上海电机学院 | A kind of improvement sample data missing values based on GKNN fill up algorithm |
Non-Patent Citations (4)
Title |
---|
CHI ZHANG等: "The Nearest Neighbor Algorithm of Filling Missing Data Based on Cluster Analysis", 《APPLIED MECHANICS AND MATERIALS》 * |
刘月程等: "质谱代谢组学数据预处理方法研究", 《化学分析计量》 * |
林晓惠等: "基于相关性组合变量的色谱数据分析方法", 《第21届全国色谱学术报告会及仪器展览会会议论文集》 * |
董学思等: "多组学联合缺失数据填补方法的评价", 《中国卫生统计》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737463A (en) * | 2020-06-04 | 2020-10-02 | 江苏名通信息科技有限公司 | Big data missing value filling method, device and computer program |
CN111737463B (en) * | 2020-06-04 | 2024-02-09 | 江苏名通信息科技有限公司 | Big data missing value filling method, device and computer readable memory |
CN111859275A (en) * | 2020-07-20 | 2020-10-30 | 厦门大学 | Mass spectrum data missing value filling method and system based on non-negative matrix factorization |
CN111859275B (en) * | 2020-07-20 | 2022-08-12 | 厦门大学 | Mass spectrum data missing value filling method and system based on non-negative matrix factorization |
CN113485986A (en) * | 2021-06-25 | 2021-10-08 | 国网江苏省电力有限公司信息通信分公司 | Electric power data restoration method |
Also Published As
Publication number | Publication date |
---|---|
CN110097920B (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097920A (en) | A kind of metabolism group shortage of data value fill method based on neighbour's stability | |
Keun | Metabonomic modeling of drug toxicity | |
US20160216244A1 (en) | Method and electronic nose for comparing odors | |
Cooper et al. | Relative, label-free protein quantitation: spectral counting error statistics from nine replicate MudPIT samples | |
CN111122757B (en) | Metabonomics-based research method for bee toxicity effect caused by date flower honey | |
WO1999050437A1 (en) | Methodology for predicting and/or diagnosing disease | |
Hoehenwarter et al. | A rapid approach for phenotype‐screening and database independent detection of cSNP/protein polymorphism using mass accuracy precursor alignment | |
CN101915815A (en) | Method for detecting freshness of meat | |
CA2777501A1 (en) | Biomarkers and identification methods for the early detection and recurrence prediction of breast cancer using nmr | |
CN109557165B (en) | Method for monitoring the quality of a mass spectrometry imaging preparation workflow | |
Yang et al. | Serum metabolic profiling study of endometriosis by using wooden-tip electrospray ionization mass spectrometry | |
WO2023207453A1 (en) | Traditional chinese medicine ingredient analysis method and system based on spectral clustering | |
CN109920473A (en) | A kind of metabolism group marker weight analysis universal method | |
CN102488309B (en) | Intelligent tobacco formulation method | |
Shurubor et al. | Analytical precision, biological variation, and mathematical normalization in high data density metabolomics | |
CN108491690B (en) | Method for predicting quantitative efficiency of peptide fragment in proteomics | |
US20100311600A1 (en) | Breast cancer biomarkers and identification methods using nmr and gas chromatography-mass spectrometry | |
Straume et al. | [7] Model-independent quantification of measurement error: Empirical estimation of discrete variance function profiles based on standard curves | |
JP2011085427A (en) | Method for acquiring degree of metabolic disorder, method for determining metabolic disorder, program therefor, device for acquiring degree of metabolic disorder, and diagnostic program based on determination of metabolic disorder | |
CN108287200A (en) | Materials analysis methods of the mass spectrum with reference to the method for building up of database and based on it | |
Xia et al. | Protein abundance ratios for global studies of prokaryotes | |
CN110583573A (en) | Construction and evaluation method of blood deficiency mouse model | |
US9189595B2 (en) | Apparatus and associated method for analyzing small molecule components in a complex mixture | |
US20160379811A1 (en) | Method of determining cell cycle stage distribution of cells | |
CN118133048B (en) | College student physique test data acquisition method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220920 |