CN110245157A - A kind of data difference analysis method and system based on Multilayer networks - Google Patents
A kind of data difference analysis method and system based on Multilayer networks Download PDFInfo
- Publication number
- CN110245157A CN110245157A CN201910471042.6A CN201910471042A CN110245157A CN 110245157 A CN110245157 A CN 110245157A CN 201910471042 A CN201910471042 A CN 201910471042A CN 110245157 A CN110245157 A CN 110245157A
- Authority
- CN
- China
- Prior art keywords
- data
- value
- variation
- distribution
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Library & Information Science (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of data difference analysis method and system based on Multilayer networks, belongs to data analysis field.This method is first to establish data set, and the data in data set are changed;Then estimation variation front and back data aggregate probability is removed using Multilayer networks method;It is gone to select optimal window width according to maximum likelihood method, to different window widths, any one point fetched every time according to concentration, building joint probability distribution is gone with point remaining in data set, calculating this, any one puts the joint probability density value in the joint probability distribution, the product of obtained multiple a joint probability density values is likelihood value, makes the maximum window width best window width of likelihood value;According to the best window width, variation front and back data aggregate probability density distribution is obtained by Multilayer networks method, and analyze the difference of data.This method can not be limited the significance degree for acquiring each data by data distribution, for finding the data of significant changes.
Description
Technical field
The present invention relates to data analysis fields, more particularly, to a kind of data difference based on Multilayer networks point
Analyse method and system.
Background technique
The data of significant changes often have key.Such as by proteomic image technology, our available each eggs
White matter may play crucial regulation in this process and make in the expression quantity of experimental group and control group, the protein for expressing significant difference
With.People often look for differential protein according to fold differences, it is believed that the bigger protein difference of variation multiple is more significant.So
And in most cases, this hypothesis is untenable, for example 1 becomes 2 and 10 to become 20 being all 2 times of variation, but does not represent it
The significance of difference is identical.In another example influencing the amino acid mutation of protein modification state, mutation front and back makes protein modification state
The mutation of significant changes is often more important, and Omar et al. develops a kind of method (MIMP) for predicting to be mutated to phosphorylation.
However, the calculation formula of the joint probability in MIMP is invalid for independent two-dimentional variable.And its method cannot be counted
The statistical significance for calculating influence of the mutation to phosphorylation is horizontal.Currently, facing problems, people do not have very good solution side
Method, thus develop new method solve the problems, such as it is all so on it is very crucial.The present invention has developed a kind of based on Multilayer networks
Data difference analysis method, this method has statistical significance and no matter what distribution is data be, this method is applicable.
Summary of the invention
The present invention solves data difference analysis method in the prior art and is not only limited by data distribution, but also lacks system
Meter learns the technical issues of meaning.The present invention acquires variation front and back data aggregate probability density point according to Multilayer networks method
Then cloth judges the conspicuousness of data variation according to hypothesis testing.This method can not be limited by data distribution acquire it is each
The significance degree of data, for finding the data of significant changes.
According to the first aspect of the invention, a kind of data difference analysis method based on Multilayer networks is provided, is contained
There are following steps:
It (1) is n group by the group number scale of data intensive data, the n is positive integer;Containing before changing in any one group of data
Numerical value and variation after corresponding numerical value, the value before note variation is x, and value after variation is y, with the data before changing for horizontal seat
Mark is that ordinate establishes coordinate system U using the data after changing, and the corresponding coordinate points of any one group of data are (xi, yi), institute
The value range for stating i is 1≤i≤n;
(2) estimation variation front and back data aggregate probability density distribution is removed using the Multilayer networks method based on Gaussian kernel,
The formula of utilization are as follows:Wherein h is window width, and n is number
According to the group number of intensive data, f (x, y) is the probability density value in coordinate system U at any point (x, y);According to maximum likelihood method
It goes to select optimal h, method particularly includes: firstly, taking data set corresponding in the coordinate points in coordinate system U every time different h
Any one point, go building joint probability distribution with remaining n-1 point, then calculate any one described point in the joint
Joint probability density value in probability distribution, obtains n joint probability density value, and the product of the n joint probability density value is
Likelihood value makes the best h of the maximum h of likelihood value;The best h is substituted into the formula, then recycles the data set pair
It should go to construct best joint probability distribution in all coordinate points in coordinate system U;
(3) fixed to change preceding size of data x, data y in the case where fixation x, after variation is acquired in step (2) institute
State the probability density distribution in best joint probability distribution;Firstly, in the case where fixed x, using the distribution of y as X ' axis, with
Probability density of the fixation x under the best h condition is that Y ' axis establishes coordinate system U ';Then, for any in data set
One group of data (xi, yi), it acquires in the xiIn the case where, the probability density distribution of the size of data y after variation, according to yiInstitute
The position on the X ' axis of coordinate system U ' is stated, this group of data (x is acquiredi, yi) variation tendency and variation degree, method particularly includes:
It is taken on the X ' axis of the coordinate system U ' a bit, makees the straight line of the X ' axis perpendicular to coordinate system U ' by the point, the straight line is by density
The area that curve and X-axis are surrounded is divided into left and right two parts, remembers that the point is y0If yiGreater than y0, then data point (xi, yi)
Variation be up-regulation, the significance degree P of up-regulation is y > yiWhen distribution in area ratio upper density curve and X ' axis surrounded
Area, if yiLess than y0, then data point (xi, yi) variation be to lower, the significance degree P of downward is y < yiWhen distribution in
The area that area ratio upper density curve and X ' axis are surrounded, if yiEqual to y0, then data point (xi, yi) there is no variations.
Preferably, any one group of data are at least one amino around amino acid sites in step (1) described data set
Acid mutates after preceding and mutation, the probability value which modifies.
Preferably, in step (1) described data set any one group of data be before and after lysine sites in each N number of amino acid extremely
Before few amino acid generation missense mutation and after missense mutation, which occurs the probability value of succinylation;The N
For integer, the value range of N is 0 N≤50 <.
Preferably, the value range of the N is 5≤N≤15.
Preferably, step (1) data set be drug-treated cell before and processing cell after, the cell generate RNA or
Express the data of protein level.
Preferably, the n is more than or equal to 1000.
According to another aspect of the present invention, a kind of data difference analysis system based on Multilayer networks is provided, is wrapped
It includes:
Data set establishes module: the data set establishes module for establishing the data set of difference to be analyzed;By data set
The group number scale of middle data is n group, and the n is positive integer;It is corresponded to after containing numerical value and variation before changing in any one group of data
Numerical value, the value before note variation is x, and value after variation is y, using the data before changing as abscissa, is with the data after changing
Ordinate establishes coordinate system U, and the corresponding coordinate points of any one group of data are (xi, yi), the value range of the i is 1≤i
≤n;
Best window width computing module: the best window width computing module is used to calculate best window width h, and
Obtain best joint probability distribution;Estimation variation front and back data aggregate probability is removed using the Multilayer networks method based on Gaussian kernel
Density Distribution, the formula of utilization are as follows:Wherein h is that window is wide
Degree, n are the group number of data intensive data, and f (x, y) is the probability density value in coordinate system U at any point (x, y);According to most
Maximum-likelihood method goes to select optimal h, method particularly includes: firstly, taking data set corresponding in coordinate system U every time different h
Any one point in coordinate points goes building joint probability distribution with remaining n-1 point, then calculates any one described point
Joint probability density value in the joint probability distribution obtains n joint probability density value, the n joint probability density
The product of value is likelihood value, makes the best h of the maximum h of likelihood value;The best h is substituted into the formula, then described in recycling
The corresponding all coordinate points in coordinate system U of data set go to construct best joint probability distribution;
Data difference analysis module in data set: data difference analysis module is for analyzing in data set in the data set
Difference before and after data variation;It is fixed to change preceding size of data x, it acquires data y in the case where fixation x, after variation and exists
Probability density distribution in step (2) the best joint probability distribution;Firstly, being made in the case where fixed x with the distribution of y
It is that Y ' axis establishes coordinate system U ' with probability density of the fixation x under the best h condition for X ' axis;Then, for data set
In any one group of data (xi, yi), it acquires in the xiIn the case where, the probability density distribution of the size of data y after variation, root
According to yiThis group of data (x is acquired in position on the X ' axis of the coordinate system U 'i, yi) variation tendency and variation degree, specifically
Method are as follows: taken on the X ' axis of the coordinate system U ' a bit, make the straight line of the X ' axis perpendicular to coordinate system U ' by the point, this is straight
The area that density curve and X-axis are surrounded is divided into left and right two parts by line, remembers that the point is y0If yiGreater than y0, then data
Point (xi, yi) variation be up-regulation, the significance degree P of up-regulation is y > yiWhen distribution in area ratio upper density curve and X ' axis
The area surrounded, if yiLess than y0, then data point (xi, yi) variation be to lower, the significance degree P of downward is y < yiWhen
The area that area ratio upper density curve and X ' axis in distribution are surrounded, if yiEqual to y0, then data point (xi, yi) do not send out
Changing.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, mainly have below
Technological merit:
(1) the invention discloses a kind of data difference analysis method based on Multilayer networks, this method has statistics
It learns meaning and no matter what distribution is data be, this method is applicable, limits without condition, facilitates people from the change of divergence
Data in find crucial things.
(2) the present invention is implemented as follows: 1 becomes 2 and 10 to become 20 being all 2 times of variation, but it is aobvious not represent its difference
Work property is identical.However, 1 become 3 become compared to 12 difference it is more significant.We are based on the principle, by every before estimation variation
The probability density distribution of data assesses the conspicuousness of mutation front and back difference after the corresponding mutation of a data.
(3) the h value in the joint probability density distribution formula in the present invention influences the estimation of data aggregate probability density distribution
Quality, in order to obtain the best estimate of joint probability density distribution, the present invention goes to select optimal h with maximum likelihood method,
To different h (0 < h < 1), access goes building to combine according to any one point of concentration with n-1 point remaining in data set every time
Probability distribution calculates any one the described joint probability density value of point in the joint probability distribution, obtains n joint probability
Density value;Likelihood value is the product of n joint probability density value, makes the likelihood value best h of maximum h, because the probability under the h is close
Degree distribution most probable meets actual distribution.
(4) size of data x before each variation is fixed in the present invention, acquired in the case where the x, size of data y after variation
Probability density distribution;Hypothesis testing is carried out using the distribution, it is generally accepted that the data of P-value < 0.05 are significant changes
Data increase compared to numerical value before changing, it is believed that are up-regulations;Conversely, being then to lower.
Detailed description of the invention
The flow chart of method in Fig. 1 present invention.
Fig. 2 is enrichment condition of 218 genes comprising KsuMs in cancer gene and drug target gene data set.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
Not constituting a conflict with each other can be combined with each other.
Embodiment 1
Inventive method is used for the mutation for predicting to significantly affect existing succinylation site by we.It is logical that this facilitates discovery
The gene for changing succinylation network influence cancer is crossed, and provides and disease biological and treatment development is understood in depth.Prominent
Become in the impact analysis of succinylation, we are from cancer gene database The Cancer Genome Atlas (TCGA)
Incorporate 1,779,214 missense mutation in 33 kinds of major cancers type/hypotypes, 11,659 tumor samples.Wherein have
63693 missense mutations (KsuMs) occurred in lysine sites periphery (each 10 amino acid in left and right).As shown in Figure 1, we
Probability point is acquired with succinylation site estimation platform to 63693 peptide fragments comprising KsuMs, probability point reflects the site amber
Amber is acylated degree.Then, the Bayes posterior probability of estimation mutation front and back is removed using the Parzen window method based on Gaussian kernel
Joint probability density:
Wherein h is window width, and n is the quantity of KsuMs, here, n=63693.The selection of h decides that probability is close
The quality of estimation is spent, we go to select optimal h according to maximum likelihood method, to different h, 1 point are taken every time, with n-1 point
Estimation joint probability density is gone, the probability density value of 1 point is sought, finally obtains n probability density value.Likelihood value is n probability
Product the f ((x of density value1,y1),(x2,y2),...,(xn,yn) | h)=f ((x1,y1)|h)×f((x2,y2)|h)×…×f
((xn,yn)|h).Make the maximum h of likelihood value best h, best h=0.018.
Finally, probability density distribution is as shown in Fig. 2, fixed x, is acquired in the case where the x, the probability density distribution of y, I
Use P-value < 0.05 to carry out hypothesis testing as threshold value, obtaining mutation front and back makes succinylation significantly increase and weaken
KsuMs.We are arranged the posterior probability after up-regulation and are greater than 0.5, to guarantee that succinylation occurs for the site after being mutated, before downward
Posterior probability be greater than 0.5 be used as threshold value, with guarantee mutation before for the site occur succinylation.Finally obtaining 306 makes amber
Acylated KsuMs and 64 KsuMs for significantly increasing succinylation being obviously reduced of amber, is present on 218 genes.
As shown in Fig. 2, 218 genes are respectively mapped to 719 cancers in database Cancer Gene Census (CGC)
On 2921 drug target gene data sets of disease gene and medicine target database D rugBank, found by hypergeometry analysis in 2 numbers
According to equal significant enrichment is concentrated, enrichment degree is respectively 2.62 times (P-value=3.03E-04) and 4.15 times of (P-value=
1.20E-44), it implies that the degree of correlation of the 218 succinylation gene and cancer is higher, also illustrates the reliable journey of our results
It spends higher.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.
Claims (7)
1. a kind of data difference analysis method based on Multilayer networks, which is characterized in that contain following steps:
It (1) is n group by the group number scale of data intensive data, the n is positive integer;Containing the number before changing in any one group of data
Corresponding numerical value after value and variation, the value before note variation are x, and the value after variation is y, using the data before changing as abscissa, with
Data after variation are that ordinate establishes coordinate system U, and the corresponding coordinate points of any one group of data are (xi, yi), the i's
Value range is 1≤i≤n;
(2) estimation variation front and back data aggregate probability density distribution is removed using the Multilayer networks method based on Gaussian kernel, used
Formula are as follows:Wherein h is window width, and n is data set
The group number of middle data, f (x, y) are the probability density value in coordinate system U at any point (x, y);It goes to select according to maximum likelihood method
Optimal h is selected, method particularly includes: firstly, taking data set corresponding appointing in the coordinate points in coordinate system U every time to different h
It anticipates a point, goes building joint probability distribution with remaining n-1 point, then calculate any one described point in the joint probability
Joint probability density value in distribution, obtains n joint probability density value, and the product of the n joint probability density value is likelihood
Value, makes the best h of the maximum h of likelihood value;The best h is substituted into the formula, the data set is then recycled to correspond to
All coordinate points in coordinate system U go to construct best joint probability distribution;
(3) fixed to change preceding size of data x, the data y acquired in the case where fixation x, after variation is described most in step (2)
Probability density distribution in good joint probability distribution;Firstly, in the case where fixed x, it is solid with this using the distribution of y as X ' axis
Determining probability density of the x under the best h condition is that Y ' axis establishes coordinate system U ';Then, for any one group in data set
Data (xi, yi), it acquires in the xiIn the case where, the probability density distribution of the size of data y after variation, according to yiIn the seat
This group of data (x is acquired in position on the X ' axis of mark system U 'i, yi) variation tendency and variation degree, method particularly includes: described
It is taken on the X ' axis of coordinate system U ' a bit, makees the straight line of the X ' axis perpendicular to coordinate system U ' by the point, the straight line is by density curve
Left and right two parts are divided into the area that X-axis is surrounded, remember that the point is y0If yiGreater than y0, then data point (xi, yi) change
Change is up-regulation, and the significance degree P of up-regulation is y > yiWhen distribution in area ratio upper density curve and the area that is surrounded of X ' axis,
If yiLess than y0, then data point (xi, yi) variation be to lower, the significance degree P of downward is y < yiWhen distribution in area
Than upper density curve and X ' area that is surrounded of axis, if yiEqual to y0, then data point (xi, yi) there is no variations.
2. as described in claim 1 based on the data difference analysis method of Multilayer networks, which is characterized in that step (1)
Before any one group of data mutate in the data set at least one amino acid around amino acid sites and after mutation, it is somebody's turn to do
The probability value that amino acid sites are modified.
3. as claimed in claim 2 based on the data difference analysis method of Multilayer networks, which is characterized in that step (1)
Any one group of data are that at least one amino acid generation missense is prominent in each N number of amino acid in lysine sites front and back in the data set
Before becoming and after missense mutation, which occurs the probability value of succinylation;The N is integer, and the value range of N is 0
N≤50 <.
4. a kind of data difference analysis method based on Multilayer networks as claimed in claim 3, which is characterized in that described
The value range of N is 5≤N≤15.
5. as described in claim 1 based on the data difference analysis method of Multilayer networks, which is characterized in that step (1)
The data set is before drug-treated cell and after processing cell, which generates RNA or expresses the data of protein level.
6. as described in claim 1 based on the data difference analysis method of Multilayer networks, which is characterized in that the n is big
In equal to 1000.
7. a kind of data difference analysis system based on Multilayer networks characterized by comprising
Data set establishes module: the data set establishes module for establishing the data set of difference to be analyzed;By number in data set
According to group number scale be n group, the n be positive integer;In any one group of data containing before changing numerical value and variation after corresponding number
Value, the value before note variation are x, and the value after variation is y, are vertical sit with the data after changing using the data before changing as abscissa
Mark establishes coordinate system U, and the corresponding coordinate points of any one group of data are (xi, yi), the value range of the i is 1≤i≤n;
Best window width computing module: the best window width computing module is obtained for calculating best window width h
Best joint probability distribution;Estimation variation front and back data aggregate probability density is gone using the Multilayer networks method based on Gaussian kernel
Distribution, the formula of utilization are as follows:Wherein h is window width, n
For the group number of data intensive data, f (x, y) is the probability density value in coordinate system U at any point (x, y);Seemingly according to maximum
Right method goes to select optimal h, method particularly includes: firstly, taking the corresponding coordinate in coordinate system U of data set every time to different h
Any one point in point goes building joint probability distribution with remaining n-1 point, then calculates any one described point at this
Joint probability density value in joint probability distribution, obtains n joint probability density value, the n joint probability density value it
Product is likelihood value, makes the best h of the maximum h of likelihood value;The best h is substituted into the formula, then recycles the data
The corresponding all coordinate points in coordinate system U of collection go to construct best joint probability distribution;
Data difference analysis module in data set: data difference analysis module is for analyzing data intensive data in the data set
Change the difference of front and back;It is fixed to change preceding size of data x, data y in the case where fixation x, after variation is acquired in step
(2) probability density distribution in the best joint probability distribution;Firstly, in the case where fixed x, using the distribution of y as X '
Axis is that Y ' axis establishes coordinate system U ' with probability density of the fixation x under the best h condition;Then, in data set
Any one group of data (xi, yi), it acquires in the xiIn the case where, the probability density distribution of the size of data y after variation, according to yi
This group of data (x is acquired in position on the X ' axis of the coordinate system U 'i, yi) variation tendency and variation degree, specific method
Are as follows: it is taken on the X ' axis of the coordinate system U ' a bit, makees the straight line of the X ' axis perpendicular to coordinate system U ' by the point, which will
The area that density curve and X-axis are surrounded is divided into left and right two parts, remembers that the point is y0If yiGreater than y0, then data point
(xi, yi) variation be up-regulation, the significance degree P of up-regulation is y > yiWhen distribution in area ratio upper density curve and X ' axis institute
The area surrounded, if yiLess than y0, then data point (xi, yi) variation be to lower, the significance degree P of downward is y < yiTime-division
The area that area ratio upper density curve and X ' axis in cloth are surrounded, if yiEqual to y0, then data point (xi, yi) there is no
Variation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910471042.6A CN110245157B (en) | 2019-05-31 | 2019-05-31 | Data difference analysis method and system based on probability density estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910471042.6A CN110245157B (en) | 2019-05-31 | 2019-05-31 | Data difference analysis method and system based on probability density estimation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110245157A true CN110245157A (en) | 2019-09-17 |
CN110245157B CN110245157B (en) | 2021-06-11 |
Family
ID=67885806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910471042.6A Active CN110245157B (en) | 2019-05-31 | 2019-05-31 | Data difference analysis method and system based on probability density estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110245157B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103207997A (en) * | 2013-04-15 | 2013-07-17 | 浙江捷尚视觉科技有限公司 | Kernel density estimation-based license plate character segmentation method |
CN103776891A (en) * | 2013-09-04 | 2014-05-07 | 中国科学院计算技术研究所 | Method for detecting differentially-expressed protein |
CN106533577A (en) * | 2016-10-09 | 2017-03-22 | 南京工业大学 | Non-Gaussian noise suppression method based on energy detection |
US20170364664A1 (en) * | 2014-02-25 | 2017-12-21 | Flagship Biosciences, Inc. | Method for stratifying and selecting candidates for receiving a specific therapeutic approach |
CN108763872A (en) * | 2018-04-25 | 2018-11-06 | 华中科技大学 | A method of analysis prediction cancer mutation influences LIR die body functions |
CN109815870A (en) * | 2019-01-17 | 2019-05-28 | 华中科技大学 | The high-throughput functional gene screening technique and system of cell phenotype image quantitative analysis |
-
2019
- 2019-05-31 CN CN201910471042.6A patent/CN110245157B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103207997A (en) * | 2013-04-15 | 2013-07-17 | 浙江捷尚视觉科技有限公司 | Kernel density estimation-based license plate character segmentation method |
CN103776891A (en) * | 2013-09-04 | 2014-05-07 | 中国科学院计算技术研究所 | Method for detecting differentially-expressed protein |
US20170364664A1 (en) * | 2014-02-25 | 2017-12-21 | Flagship Biosciences, Inc. | Method for stratifying and selecting candidates for receiving a specific therapeutic approach |
CN106533577A (en) * | 2016-10-09 | 2017-03-22 | 南京工业大学 | Non-Gaussian noise suppression method based on energy detection |
CN108763872A (en) * | 2018-04-25 | 2018-11-06 | 华中科技大学 | A method of analysis prediction cancer mutation influences LIR die body functions |
CN109815870A (en) * | 2019-01-17 | 2019-05-28 | 华中科技大学 | The high-throughput functional gene screening technique and system of cell phenotype image quantitative analysis |
Non-Patent Citations (3)
Title |
---|
BABICH等: "Weighted parzen windows for pattern classification", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS & MACHINE INTELLIGENCE》 * |
TAHERZADEH等: "predicting lysine-malonylation sites of proteins using sequence and predicted structural features", 《JOURNAL OF COMPUTATIONAL CHEMISTRY》 * |
徐阳等: "WERAM:关于真核生物中组蛋白乙酰化和甲基", 《中国生物工程学会第二届青年科技论坛》 * |
Also Published As
Publication number | Publication date |
---|---|
CN110245157B (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Miller et al. | Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities | |
Chen et al. | Convex clustering: An attractive alternative to hierarchical clustering | |
CN109637579B (en) | Tensor random walk-based key protein identification method | |
Huff et al. | Detecting positive selection from genome scans of linkage disequilibrium | |
Suresh et al. | Recurrent neural network for genome sequencing for personalized cancer treatment in precision healthcare | |
Wang et al. | Variational inference for coupled hidden markov models Applied to the Joint Detection of Copy Number Variations | |
Tran et al. | A novel method for single-cell data imputation using subspace regression | |
Song et al. | MiXcan: a framework for cell-type-aware transcriptome-wide association studies with an application to breast cancer | |
Huo et al. | Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals | |
CN110245157A (en) | A kind of data difference analysis method and system based on Multilayer networks | |
Li et al. | Evolving spatial clusters of genomic regions from high-throughput chromatin conformation capture data | |
Nouira et al. | Multitask group Lasso for Genome Wide association Studies in diverse populations | |
Cong et al. | Big data driven oriented graph theory aided tagsnps selection for genetic precision therapy | |
Bhattacharya et al. | Effects of gene–environment and gene–gene interactions in case-control studies: A novel Bayesian semiparametric approach | |
CN111785319A (en) | Drug relocation method based on differential expression data | |
Li et al. | A comparative study for identifying the chromosome-wide spatial clusters from high-throughput chromatin conformation capture data | |
Xia | Sequence-based multiscale modeling for high-throughput chromosome conformation capture (Hi-C) data analysis | |
Joo | Bayesian lasso: An extension for genome-wide association study | |
Boitard et al. | Linkage disequilibrium interval mapping of quantitative trait loci | |
Liu et al. | Inferring single-cell copy number profiles through cross-cell segmentation of read counts | |
Xing et al. | High-dimensional sparse structured input-output models, with applications to gwas | |
He | STATISTICAL METHODS TO STUDY TRANSPOSON SEQUENCING DATA: NONPARAMETRIC BAYESIAN MODELS WITH SAMPLING ALGORITHMS | |
Milite et al. | Genotyping Copy Number Alterations from single-cell RNA sequencing | |
Kang et al. | Haplotype assembly from weighted SNP fragments and related genotype information | |
Ingraham | Probabilistic Models of Structure in Biological Sequences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Xue Yu Inventor after: Ning Wanshan Inventor after: Xu Haodong Inventor after: Deng Wangun Inventor after: Guo Yaping Inventor before: Xue Ning Inventor before: Ning Wanshan Inventor before: Xu Haodong Inventor before: Deng Wangun Inventor before: Guo Yaping |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |