CN110291589B - Biological substance analysis method and apparatus, and computer-readable storage medium - Google Patents

Biological substance analysis method and apparatus, and computer-readable storage medium Download PDF

Info

Publication number
CN110291589B
CN110291589B CN201880011585.3A CN201880011585A CN110291589B CN 110291589 B CN110291589 B CN 110291589B CN 201880011585 A CN201880011585 A CN 201880011585A CN 110291589 B CN110291589 B CN 110291589B
Authority
CN
China
Prior art keywords
biological
series data
groups
time series
biological substances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201880011585.3A
Other languages
Chinese (zh)
Other versions
CN110291589A (en
Inventor
前田青广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujifilm Corp
Original Assignee
Fujifilm Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujifilm Corp filed Critical Fujifilm Corp
Publication of CN110291589A publication Critical patent/CN110291589A/en
Application granted granted Critical
Publication of CN110291589B publication Critical patent/CN110291589B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/20Probabilistic models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/30Dynamic-time models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Physiology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Immunology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Hematology (AREA)
  • Urology & Nephrology (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)

Abstract

The invention provides a biological substance analysis method and device and a computer readable storage medium, wherein the biological substance analysis method comprises the following steps: preparing time series data for acquiring values representing the amounts or states of a plurality of biological substances at a plurality of time points for each biological substance; dividing the plurality of biological substances into a plurality of groups according to time variation of time series data of each biological substance; generating representative time series data representing the states of the groups by operation processing according to the time series data of at least 1 biological substances contained in the groups; and estimating the dependency relationship between the groups by an arithmetic process based on the representative time series data of each group.

Description

Biological substance analysis method and apparatus, and computer-readable storage medium
Technical Field
The present disclosure relates to a biological substance analysis method and apparatus and a computer-readable storage medium for estimating a dependency relationship when a plurality of biological substances function in an organism.
Background
In vivo, many (several tens of thousands or more in humans) genes exist as RNA (ribonucleic acid) and proteins. The amount or state (e.g., chemical modification) of these genes is interdependent and changes over time.
Further, since these genes act in the living body by the dependency relationship, for example, when cells are treated with a drug or the like, the action mechanism of the drug can be clarified by analyzing which dependency relationship the amount or state of these genes have and changing with time.
Among them, international publication No. 2010/064414 (patent document 1) shows a method of grouping a plurality of genes according to the similarity of time variation of expression amounts. Further, in U.S. patent application publication No. 2009/0110280 (patent document 2), a method of grouping a plurality of genes based on similarity of expression data and similarity of biological functions related to the expression data is shown.
However, patent document 1 and patent document 2 do not propose any method for investigating the dependency relationship between the expression levels of a plurality of genes. It is difficult to know the mechanism of action of the above-described agents and the like by merely grouping a plurality of genes.
International publication No. 2004/048532 (patent document 3) shows a method of estimating the dependency relationship of a plurality of genes from time-series data of the expression levels of these genes.
Disclosure of Invention
Technical problem to be solved by the invention
However, in the method described in patent document 3, if the number of measurement points of data is smaller than the number of genes to be analyzed, it is difficult to determine the estimation result. Since the amount of gene expression is measured at most points in time, the number of points at the time of measurement may be less than 1 or equal to the number of genes, and most of these cases are uncertain estimation results.
This problem occurs for the following reasons. When the number of time series data measured is smaller than the number of genes, the larger the difference in the number, the more obvious the time series data of a plurality of genes becomes, and the more likely the time series data are, the more likely the time series data are changed. Thus, in an algorithm for estimating the dependency relationship of time series data, it becomes difficult to distinguish between a plurality of genes, and it becomes difficult to determine the estimation result.
The present disclosure has been made in view of the above-described problems, and an object thereof is to provide a biological substance analysis method and apparatus and a computer-readable storage medium which facilitate determination of a result of estimation when estimating a dependency relationship between biological substances from time-series data of amounts or state values of a plurality of biological substances measured at a plurality of points, even when the number of points of measurement of the data is smaller than the number of biological substances.
Means for solving the technical problems
The biological substance analysis method of the present disclosure includes the steps of: preparing time series data for acquiring values representing the amounts or states of a plurality of biological substances at a plurality of time points for each biological substance; dividing the plurality of biological substances into a plurality of groups according to time variation of time series data of each biological substance; generating representative time series data representing the states of the groups by operation processing according to the time series data of at least 1 biological substances contained in the groups; and estimating the dependency relationship between the groups by an arithmetic process based on the representative time series data of each group.
Also, in the above-described biological substance analysis method of the present disclosure, a plurality of biological substances can be divided into a plurality of groups according to the similarity of time-varying of time-series data of each biological substance and the similarity of biological functions of the respective biological substances.
In the above-described biological substance analysis method of the present disclosure, the similarity of biological functions of each biological substance can be evaluated based on the genetic entity of each biological substance, the canonical pathway of each biological substance, the upstream factor possessed by each biological substance, the expression system of each biological substance, or the disease associated with each biological substance.
Also, in the above-described biological substance analysis method of the present disclosure, when a plurality of biological substances are divided into a plurality of groups, it is possible to allow at least 1 biological substance to belong to the plurality of groups.
Also, in the above-described biological substance analysis method of the present disclosure, a plurality of reference time series data are prepared in advance, and by comparing the plurality of reference time series data with time series data of each biological substance, the plurality of biological substances can be divided into a plurality of groups.
In the above-described biological material analysis method of the present disclosure, when estimating the dependency relationship between groups, the representative time series data of each group can be expressed as a function of the representative time series data of the other groups.
In the above-described biological substance analysis method of the present disclosure, when estimating the dependency relationship between groups, the value of the 1 st time point of the representative time series data of each group can be expressed as a function of the value of the 2 nd time point before the 1 st time point of the representative time series data of the other groups.
In the above-described biological substance analysis method of the present disclosure, when estimating the dependency relationship between groups, the representative time series data of each group can be expressed as a conditional probability or a conditional probability density function of the representative time series data of the other group.
In the above-described biological material analysis method of the present disclosure, the representative time series data of each group can be set to be an average value, a central value, a mode, a variance, a standard deviation, or a moment of three or more steps of values at each time point of the time series data of biological materials belonging to each group.
In the above-described biological material analysis method of the present disclosure, the value indicating the amount of the biological material can be set to a value indicating the expression amount, the presence amount, the concentration, or the density of the biological material.
In the above-described biological material analysis method of the present disclosure, the value indicating the state of the biological material can be a value indicating the presence or absence of expression of the biological material, a value indicating the presence or absence of chemical modification, or a value indicating the ratio of the biological material having chemical modification to the biological material having no chemical modification.
In the above-described method for analyzing a biological substance according to the present disclosure, the plurality of biological substances may include at least 1 or more of DNA (deoxyribonucleic acid ), RNA (ribonucleic acid), protein, and a low-molecular compound in a living body.
In the above-described biomass analysis method according to the present disclosure, each group can be set as a node, and a network map can be generated in which nodes corresponding to groups having a dependency relationship are connected by an edge.
In the above-described biological material analysis method according to the present disclosure, character information or a graph relating to a biological function of a group corresponding to each node, a graph indicating a name, a symbol, a structure, or a composition of a biological material contained in the group corresponding to each node, or character information relating to the biological material may be added to a network graph and displayed.
In the above-described biological material analysis method according to the present disclosure, the selection of the node included in the network map is received, and character information or map related to the biological function of the group corresponding to the selected node, a map indicating the name, symbol, structure, or composition of the biological material included in the group corresponding to the selected node, or character information related to the biological material can be added to the network map and displayed.
The biological substance analysis device of the present disclosure includes: a storage unit that stores time-series data in which values indicating the amounts or states of a plurality of biological substances are acquired at a plurality of points in time for each biological substance; a classification unit that classifies a plurality of biological substances into a plurality of groups according to time-series data of each biological substance; and a dependency relationship estimating unit for generating representative time series data representing the states of the respective groups based on time series data of at least 1 or more biological substances contained in the respective groups, and estimating the dependency relationship between the groups based on the representative time series data of the respective groups.
The biological substance analysis program of the present disclosure causes a computer to execute the steps of: a step of storing time series data in which values indicating the amounts or states of a plurality of biological substances are acquired at a plurality of time points for each biological substance; a step of dividing the plurality of biological substances into a plurality of groups according to time variation of time series data of each biological substance; and a step of generating representative time series data representing the states of the groups based on the time series data of at least 1 or more biological substances contained in the groups, and estimating the dependency relationship between the groups by calculation processing based on the representative time series data of the groups.
Effects of the invention
According to the biological substance analysis method and apparatus and program of the present disclosure, time series data representing values of amounts or states of a plurality of biological substances are prepared for each biological substance at a plurality of time points, respectively, and the plurality of biological substances are divided into a plurality of groups according to time variations of the time series data of each biological substance. Then, representative time series data representing the states of the respective groups are generated by an arithmetic process based on time series data of at least 1 or more biological substances contained in the respective groups, and the dependency relationship between the groups is estimated by the arithmetic process based on the representative time series data of the respective groups.
As described above, if the dependence relationship between the groups is estimated based on the time-series data of each biological material, the difference between the number of groups and the number of points at the time of measurement becomes smaller than the difference between the number of biological materials and the number of points at the time of measurement, and therefore, the estimation of the dependence relationship can be easily determined.
Drawings
Fig. 1 is a flowchart for explaining embodiment 1 of the biological substance analysis method of the present disclosure.
Fig. 2 is a diagram showing an example of time series data for each biological material.
Fig. 3 is an explanatory diagram for explaining a plurality of packets of time series data.
Fig. 4 is an explanatory diagram showing acquisition of 1 representative time series data from time series data of a plurality of biological substances contained in a group.
Fig. 5 is a conceptual diagram for explaining a method of predicting dependencies between groups from representative time series data of the groups.
Fig. 6 is a diagram for explaining an outline of a method of estimating a dependency relationship by the bayesian network method.
Fig. 7 is a flowchart for explaining embodiment 2 of the biological substance analysis method of the present disclosure.
Fig. 8 is a diagram for explaining a method of grouping according to the similarity of time-series data of each biological substance and the similarity of biological functions of the respective biological substances.
Fig. 9 is a diagram for explaining that 1 biological substance is allowed to belong to a plurality of groups.
Fig. 10 is a diagram showing an example of preset reference time series data.
Fig. 11 is a diagram showing an example of the names of the additional groups and the names of the biological substances in relation to the network map.
Fig. 12 is a diagram for explaining a method of grouping according to a change in the value of time series data at the adjacent 2-point time.
Fig. 13 is a diagram showing an example of a network diagram.
Fig. 14 is a block diagram showing a schematic configuration of a biological material analysis system according to an embodiment of the biological material analysis device according to the present disclosure.
Detailed Description
Embodiment 1 of the biological material analysis method of the present disclosure will be described in detail below with reference to the accompanying drawings. Fig. 1 is a flowchart for explaining a biological material analysis method according to the present embodiment.
In the biological material analysis method according to the present embodiment, first, time series data is prepared in which values indicating the amounts or states of a plurality of biological materials are acquired at a plurality of time points for each biological material (S10).
The plurality of biological substances include, for example, DNA (deoxyribonucleic acid), RNA (ribonucleic acid), at least 1 of proteins and low-molecular compounds in the living body. More specifically, the plurality of biological substances may be different genes such as RNA having a gene a, RNA having a gene B, and RNA having a gene C, or may be a combination of DNA and RNA.
As the value indicating the amount of the biological material, for example, the expression amount, the presence amount, the concentration, the density, or the like of the biological material can be used. As the value indicating the state of the biological material, a value indicating the presence or absence of expression of the biological material, a value indicating the presence or absence of presence of chemical modification, or a ratio of the biological material having chemical modification to the biological material having no chemical modification can be used.
As the value indicating the presence or absence of expression of a biological substance, the value indicating the presence or absence of existence of a chemical modification, for example, a value indicating "presence" is set to "1", a value indicating "absence" is set to "0", and the like, and a value indicating "presence" and a value indicating "absence" are set in advance. The presence or absence of chemical modification may be, for example, the presence or absence of phosphorylation or methylation.
The values indicating the amounts or states of the plurality of biological substances may be obtained by, for example, performing microarray measurement or the like, or may be prepared by obtaining data stored in a public database or the like via the internet or the like. As such a database, gene Expression Omnibus (gene expression database) can be used, for example. More specifically, data measuring gene expression during organ production at the time of human embryo can be used (refer to https:// www.ncbi.nlm.nih.gov/geo/query/acc. Cgiac=gse 18887). This data was obtained by Fang et al (Dev Cell,19 (1): 174-84, 2010).
Values representing amounts or states of a plurality of biological substances are measured at a plurality of time points at the biological substances, respectively, and acquired as time series data. Fig. 2 is a diagram showing an example of time series data acquired for each of biological materials 1 to N.
Next, in the biological material analysis method of the present embodiment, a plurality of biological materials are divided into a plurality of groups according to the similarity of time-series data of each biological material shown in fig. 2 (S12). Specifically, as shown in fig. 3, biological substances corresponding to time-series data similar thereto are combined into 1 group by combining time-series data similar to each other into 1 group. Regarding the similarity of the time series data, for example, the similarity of each time series data may be calculated, and the time series data equal to or greater than a predetermined threshold may be combined into 1 group.
Next, as described above, after the plurality of biological substances are divided into a plurality of groups, representative time series data indicating the states of the groups are generated by the arithmetic processing based on the time series data of at least 1 or more biological substances included in each group (S14). Fig. 4 is an explanatory diagram showing the acquisition of 1 representative time series data from time series data of biological substances 1 to 5 contained in 1 group.
The representative time series data of the group may be generated by calculating, for example, an average value, a central value, a mode, a variance, a standard deviation, or a moment of three or more steps of time series data of biological substances belonging to the group. Specifically, in the case of the example shown in fig. 4, the average value of the values at time 1 of each of the biological material 1 to biological material 5 is calculated, the average value is set to the value representing time 1 of the time series data, the average value of the values at time 2 of each of the time series data is calculated, the average value is set to the value representing time 2 of the time series data, and similarly, the average value is calculated to the value representing time n of each of the time series data, and the average value is set to the value representing time n of the time series data, thereby generating the representative time series data.
Next, in the biological material analysis method according to the present embodiment, the dependency relationship between groups is estimated by the arithmetic processing based on the generated representative time series data of each group as described above (S16).
The method of estimating the dependency relationship between groups based on the group representative time series data will be schematically described with reference to fig. 5. Representative time series data of group X, group Y and group Z are shown in fig. 5, respectively. Further, in the representative time series data of the group X, the value at the time t becomes maximum, and in the representative time series data of the group Y, the value at the time t+1 becomes maximum. That is, the representative time series data of the group Y increases to be the maximum value in accordance with the value of the representative time series data of the group X becoming the maximum value. Further, after the value of the representative time series data of the group X becomes maximum at the time point t+1, the value of the representative time series data of the group Z is greatly reduced at the time point t+2.
As described above, when the representative time series data estimated as the 1 st group and the representative time series data estimated as the 2 nd group change in association with each other, it is estimated that the 1 st group and the 2 nd group have a dependency relationship.
For example, as shown in fig. 6A and 6B, regarding the dependency relationship between groups as described above, the value of the 1 st time point (time t) of the representative time series data of each group can be modeled as a function of the value of the 2 nd time point (time t-1) preceding the 1 st time point of the representative time series data of the other groups. Thus, the state of each group at a specific time point can also be expressed as a past state depending on the other groups. In fig. 6A, α to ζ each represent a group. For example, when the representative time series data of the group α changes at time t in response to the change of the representative time series data of the group α at time t-1, it is estimated that there is a dependency relationship between the group α and the other groups. In the example shown in fig. 6A, it is assumed that group α depends on group β, group γ depends on group β, group δ depends on groups α, β, γ, group εdepends on groups β, γ, and group ζ depends on group β. As described above, the state of each group at a specific time point can also be expressed as a past state depending on the other groups.
As described above, when the value representing the time series data of each group is expressed as a function of the value representing the time series data of the other group, the value may be expressed as a conditional probability or a conditional probability density function of the time series data of the other group. Since the data describing the behavior of the biological material contains disturbance, the behavior occurring in the living body can be estimated more accurately by recording the probability of use.
In the present embodiment, an example is shown in which the dependency relationship between groups is modeled by using a bayesian network, but the present invention is not limited to this, and modeling may be performed by using other known methods such as a boolean network, a differential equation system, and the like.
Fig. 6B is an example of a network diagram showing nodes corresponding to groups having a dependency relationship by edge connection using each group as a node. The network diagram shown in fig. 6B can be displayed on a display device or the like, for example. By displaying the network map as above, the user is made more aware of the dependencies.
According to the biological material analysis method of the above embodiment, since the dependence relationship between the groups is estimated based on the time-series data of each biological material and the groups are grouped based on the time-series data of each biological material, the difference between the number of groups and the number of points at the time of measurement is smaller than the difference between the number of biological materials and the number of points at the time of measurement, and therefore, the estimation of the dependence relationship can be easily determined.
Next, embodiment 2 of the method for analyzing a biological material according to the present disclosure will be described. Fig. 7 is a flowchart for explaining the biological material analysis method according to the present embodiment. In the above-described biological material analysis method according to embodiment 1, the plurality of biological materials are grouped based on the similarity of time-series data of each biological material, but in embodiment 2, the group is further performed in consideration of the similarity of biological functions of the biological materials (S22).
For example, as shown in fig. 8, after grouping group 1 and group 2 according to the similarity of time-varying time-series data of each biological substance, group 1 is further grouped into group 1_1, group 1_2 and group 1_3, and group 2 is grouped into group 2_1 and group 2_2 according to the similarity of biological functions of the respective biological substances belonging to group 1.
The evaluation of the similarity of biological functions of biological substances may be performed, for example, on the basis of whether the biological substances have a common gene body, belong to a common canonical pathway, have a common upstream factor, are related to a common expression system, or are related to a common disease.
In addition, as described above, when a plurality of biological substances are grouped based on the similarity of time-series data of each biological substance and the similarity of biological functions of the respective biological substances, it is possible to allow at least 1 biological substance to belong to a plurality of groups. Fig. 9 is a diagram showing an example in which 1 biological substance belongs to a plurality of groups. The 1 black circles shown in fig. 9 represent 1 biological substance. In the example shown in fig. 9, biological substances belonging to 2 groups exist, and biological substances belonging to 3 groups.
Biological substances have many cases related to biological functions. As described above, by allowing 1 or more biological substances to belong to 2 or more groups, a more accurate estimation result of the behavior actually occurring in the living body is obtained.
The biological material analysis method of embodiment 2 is the same as that of embodiment 1 except that steps (S20, S24, S26 shown in fig. 7) other than the grouping of the biological functions of the biological materials are also used.
As in the biological material analysis method of embodiment 1, since a large number of biological materials are grouped into a very small number of groups based on the similarity of time-series data of each biological material, there is a case where even if the dependency relationship between groups is estimated, for example, a sufficient effect cannot be obtained for the purpose of learning the mechanism of action of a drug or the like. According to the biological material analysis method of embodiment 2, since a plurality of biological materials are grouped based on the similarity of time-series data and the similarity of biological functions, the above-described problems can be reduced or eliminated.
In addition, even if the dependence relationship between biological substances is estimated, it may be difficult for humans to understand the biological significance. In contrast, in the biological material analysis method according to embodiment 2, since the biological materials are grouped according to similarity of biological functions, when the dependency relationship between the groups is estimated, the estimation result can be interpreted by the functional unit, and therefore, the biological material analysis method can be easily understood.
In the biological material analysis method according to the above embodiment, the plurality of biological materials are grouped by calculating the similarity of time series data for each biological material, but the method of grouping is not limited. For example, as shown in fig. 10, a plurality of reference time series data may be prepared in advance, and a plurality of biological substances may be divided into a plurality of groups by comparing the reference time series data with time series data of each biological substance. That is, it may be set that biological substances corresponding to time series data similar to the same reference time series data are grouped so as to belong to the same group with each other.
When the network diagram shown in fig. 6B is displayed on the display device, the names of the groups and the names of the biological substances included in the groups may be displayed for the nodes corresponding to the groups as shown in fig. 11. The name of the biological material is not limited, and may be displayed by adding character information related to other biological materials, a symbol indicating the biological material, a structure of the biological material, a diagram indicating the composition of the biological material, or the like to the network diagram. Further, character information, a graph, or the like relating to the same biological function as the biological substances contained in the group may be added to the network graph and displayed.
In addition, as described above, when the number of nodes is large when the names of the groups are displayed, it may be difficult to display the names of the groups for all the nodes. Therefore, for example, by receiving an input instruction from an input device such as a mouse or a keyboard, it is possible to receive a selection of a node included in the network map, and display only the name of the selected node attached group, the name of the biological substance, or the like. Thus, the user can display only the information of interest, and can easily see the network map.
Next, a specific example of the biological material analysis method according to embodiment 2 will be described. Among them, an example of analysis of data for measuring gene expression during organ production in a human embryo will be described.
First, data for measuring the expression of the above-described genes were obtained from Gene Expression Omnibus as a public database. In addition, the above data were obtained by measuring the expression of most genes in the 6 time points of 9 to 14 of the Carnegie stage (a criterion for staging development according to the characteristics of embryo morphology) using a microarray.
For each gene, the time series data of each gene was calculated by converting the value of the expression amount of the gene at the 6-point into a differential value derived from time average to normalize.
Also, time-series data similar in time variation is grouped by grouping. Specifically, it is determined whether or not the change in the value of the time series data in the time series data having 5 adjacent 2 points (for example, the 2 points of the phase transition from 9 to 10, etc.) is increased, unchanged or decreased, and classified into 243 (3 5 ) A group. In this classification, time series data determined as "unchanged" between 5 all of the adjacent 2 time points are excluded from analysis described later.
Next, as shown in fig. 8, the genes having similar biological functions are further combined with each other in each group classified as described above, and a plurality of genes included in the group are grouped. Specifically, genes with similar gene ontology terminology (http:// www.geneontology.org /) were grouped using Functional Annotation Clustering (functional annotation clustering) of DAVID (https:// DAVID. Ncifcrf. Gov /) as a common Web button. At this time, genes allocated to 2 or more groups are allowed to exist.
Thus, 468 groups were obtained as a result of grouping according to the similarity of time-series data of each gene and the similarity of biological functions of each gene.
Next, representative time series data of each group is generated by calculating an average value of values of time series data of genes belonging to each group.
Then, the time-dependent relationship between the representative time series data of the group 468 is estimated by using the bayesian network method. Specifically, dependency relationship is estimated by giving 468 sets of representative time series data to SiGN-BN software (http:// SiGN. Hgc. Jp /).
Then, each group is set as a node based on the estimated dependency relationship, and the nodes having the dependency relationship are connected to each other by an edge, thereby generating a network map as shown in fig. 13. Thus, a hierarchy is obtained that matches biological insights into the generation of multiple groups of states that control the formation of individual organs as a minority group with the function of collective organ generation. In FIG. 13, 2 groups indicated by the bold circle marks are located at the most upstream of the network and mainly contain genes of transcription factors as components. Downstream of these 2, 85% of all groups are included, showing the state of a minority group control multi-group.
Next, a biological material analysis system for performing the biological material analysis methods according to embodiments 1 and 2 will be described. Fig. 14 is a block diagram showing a schematic configuration of a biological material analysis system 1 according to an embodiment of the biological material analysis device according to the present disclosure.
As shown in fig. 14, the biological material analysis system 1 includes a biological material analysis device 10, a display device 20, and an input device 30.
The biological material analysis device 10 is constituted by a computer including a central processing unit, a semiconductor memory, a hard disk, and the like, and one embodiment of the biological material analysis program of the present disclosure is mounted on the hard disk. The storage unit 11, the classification unit 12, the dependency relationship estimation unit 13, and the control unit 14 shown in fig. 1 function by executing the biomass analysis program by the central processing unit, and the step of storing the time series data of each biomass, the step of dividing the plurality of biomass into a plurality of groups based on the time series change of the time series data of each biomass, the step of generating representative time series data indicating the state of each group based on the time series data of at least 1 or more biomass contained in each group, and the step of estimating the dependency relationship between groups based on the representative time series data of each group are executed by the computer.
The storage unit 11 is formed of a storage medium such as a semiconductor memory or a hard disk, and stores time series data of each biological material.
The classification section 12 classifies the plurality of biological substances into a plurality of groups according to time-series data of each biological substance. The classification unit 12 performs grouping using Functional Annotation Clustering of the DAVID. The specific grouping method is the same as the biological substance analysis method of the above-described embodiments 1 and 2. In addition, when the biological material analysis method according to embodiment 2 is carried out, the biological functions of the biological materials are set in advance in association with the biological materials and time series data thereof.
The dependency relationship estimating unit 13 generates representative time series data indicating the states of the respective groups based on time series data of at least 1 or more biological substances included in the respective groups, and estimates the dependency relationship between the groups based on the representative time series data of the respective groups. The dependency relationship estimating unit 13 estimates the dependency relationship between groups by using, for example, the above-described SiGN-BN software. The method for estimating the dependency relationship between specific groups is also similar to the biological substance analysis methods of embodiments 1 and 2.
The control unit 14 is constituted by a central processing unit, and controls the entire biological material analysis device 10.
The display device 20 is constituted by a liquid crystal display or the like, and displays the network map or the like by control of the control unit 14.
The input device 30 is constituted by a mouse, a keyboard, or the like, and receives a selection from any one of a plurality of nodes included in the network diagram displayed on the display device 20. When the selection of a node is received by the input device 30, the name of the group corresponding to the node, the name of the biological substance contained in the group, and the like are displayed on the display device 20.
The disclosure of japanese application No. 2017-024333, the entire contents of which are incorporated herein by reference.
All documents, patent applications and technical standards described in this specification are incorporated in this specification by reference to the same extent as if each individual document, patent application or technical standard was specifically and individually described by reference.

Claims (14)

1. A method of analyzing a biological substance, comprising the steps of:
preparing time series data obtained by acquiring values representing the amounts or states of a plurality of biological substances at a plurality of time points for each of the biological substances;
dividing the plurality of biological substances into a plurality of groups according to similarity of time variation of time series data of each biological substance;
generating representative time series data representing states of the groups by an arithmetic processing based on the time series data of at least 1 or more biological substances contained in the groups;
estimating the dependency relationship between the groups by an operation process based on the representative time series data of each group; and
The estimated dependency relationship is displayed to the user by a display device,
wherein the plurality of biological substances are divided into a plurality of groups based on the similarity of time-varying of the time-series data of each of the biological substances and the similarity of biological functions of the respective biological substances,
when the plurality of biological substances are divided into a plurality of groups, at least 1 biological substance is allowed to belong to a plurality of the groups,
when estimating the dependency relationship between the groups, the representative time series data of each group is expressed as a function of the representative time series data of the other groups.
2. The method for analyzing a biological material according to claim 1, wherein,
the similarity of the biological functions of the biological substances is evaluated based on the genetic ontology of the biological substances, the canonical pathway of the biological substances, the upstream factors of the biological substances, the expression system of the biological substances, or the diseases related to the biological substances.
3. The method for analyzing a biological material according to claim 1 or 2, wherein,
a plurality of reference timing data are prepared in advance,
and dividing the plurality of biological substances into a plurality of groups by comparing the plurality of reference time series data with time series data of each of the biological substances.
4. The method for analyzing a biological material according to claim 1, wherein,
when estimating the dependency relationship between the groups, the value of the 1 st time point of the representative time series data of each group is expressed as a function of the value of the 2 nd time point before the 1 st time point of the representative time series data of the other groups.
5. The method for analyzing a biological material according to claim 1, wherein,
when estimating the dependency relationship between the groups, the representative time series data of each group is expressed as a conditional probability or a conditional probability density function of the representative time series data of the other groups.
6. The method for analyzing a biological material according to claim 1 or 2, wherein,
the representative time series data of each group is set as an average value, a central value, a mode, a discrete, a standard deviation, or a moment of three or more steps of values at each time point of the time series data of the biological substances belonging to each group.
7. The method for analyzing a biological material according to claim 1 or 2, wherein,
the value indicating the amount of the biological substance is a value indicating the expression amount, the presence amount, the concentration or the density of the biological substance.
8. The method for analyzing a biological material according to claim 1 or 2, wherein,
the value indicating the state of the biological material is a value indicating the presence or absence of expression of the biological material, a value indicating the presence or absence of chemical modification, or a value indicating the ratio of the biological material having chemical modification to the biological material having no chemical modification.
9. The method for analyzing a biological material according to claim 1 or 2, wherein,
the plurality of biological substances can include at least 1 or more of DNA, RNA, proteins, and low-molecular compounds in the living body.
10. The method for analyzing a biological material according to claim 1 or 2, wherein,
and generating a network graph by connecting the nodes corresponding to the groups having the dependency relationship by using edges.
11. The method for analyzing a biological material according to claim 10, wherein,
the character information or the graph related to the biological functions of the group corresponding to each node, the graph representing the names, symbols, structures, or compositions of the biological substances contained in the group corresponding to each node, or the character information related to the biological substances is added to the network graph and displayed.
12. The method for analyzing a biological material according to claim 11, wherein,
receiving a selection of the node contained in the network map,
and character information or a graph relating to a biological function of a group corresponding to the selected node, a graph representing a name, a symbol, a structure, or a composition of a biological substance included in the group corresponding to the selected node, or character information relating to the biological substance is added to the network graph and displayed.
13. A biological substance analysis device is provided with:
a storage unit that stores time series data obtained by acquiring values indicating amounts or states of a plurality of biological substances at a plurality of time points for each of the biological substances;
a classification unit that classifies the plurality of biological substances into a plurality of groups according to a time change of time series data of each biological substance;
a dependency relationship estimating unit configured to generate representative time series data indicating a state of each group based on the time series data of at least 1 or more biological substances included in each group, and estimate a dependency relationship between the groups based on the representative time series data of each group; and
A display means for displaying the estimated dependency relationship to a user,
wherein the plurality of biological substances are divided into a plurality of groups based on the similarity of time-varying of the time-series data of each of the biological substances and the similarity of biological functions of the respective biological substances,
when the plurality of biological substances are divided into a plurality of groups, at least 1 biological substance is allowed to belong to a plurality of the groups,
when estimating the dependency relationship between the groups, the representative time series data of each group is expressed as a function of the representative time series data of the other groups.
14. A computer-readable storage medium storing a biological substance analysis program that causes a computer to execute the steps of:
a step of storing time series data obtained by acquiring values representing amounts or states of a plurality of biological substances at a plurality of time points for each of the biological substances;
a step of dividing the plurality of biological substances into a plurality of groups according to a time variation of time series data of each of the biological substances;
generating representative time series data representing states of the groups according to the time series data of at least 1 biological substances contained in the groups, and estimating dependency relationships among the groups according to the representative time series data of the groups; and
A step of displaying the estimated dependency relationship to a user by a display device,
wherein the plurality of biological substances are divided into a plurality of groups based on the similarity of time-varying of the time-series data of each of the biological substances and the similarity of biological functions of the respective biological substances,
when the plurality of biological substances are divided into a plurality of groups, at least 1 biological substance is allowed to belong to a plurality of the groups,
when estimating the dependency relationship between the groups, the representative time series data of each group is expressed as a function of the representative time series data of the other groups.
CN201880011585.3A 2017-02-14 2018-01-31 Biological substance analysis method and apparatus, and computer-readable storage medium Active CN110291589B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2017-024633 2017-02-14
JP2017024633 2017-02-14
PCT/JP2018/003207 WO2018150878A1 (en) 2017-02-14 2018-01-31 Biological substance analysis method and device, and program

Publications (2)

Publication Number Publication Date
CN110291589A CN110291589A (en) 2019-09-27
CN110291589B true CN110291589B (en) 2023-08-08

Family

ID=63170624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880011585.3A Active CN110291589B (en) 2017-02-14 2018-01-31 Biological substance analysis method and apparatus, and computer-readable storage medium

Country Status (5)

Country Link
US (1) US20190362811A1 (en)
EP (1) EP3584727A4 (en)
JP (1) JP6851401B2 (en)
CN (1) CN110291589B (en)
WO (1) WO2018150878A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004048532A2 (en) * 2002-11-25 2004-06-10 Gni Usa Inferring gene regulatory networks from time-ordered gene expression data using differential equations
CN103476337A (en) * 2011-04-15 2013-12-25 株式会社日立医疗器械 Biophotonic measurement device, biophotonic measurement device operating method, and biophotonic measurement data analysis and display method
CN105009130A (en) * 2012-10-23 2015-10-28 独立行政法人科学技术振兴机构 Detection device, detection method and detection program which support detection of sign of state transition in living organism on basis of network entropy

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141123A (en) * 2001-10-30 2003-05-16 Mamoru Kato Computer readable recording medium recording program for estimating control relationship between genes from gene expression amount data and gene array data
JP2004240541A (en) * 2003-02-04 2004-08-26 Hitachi Ltd Method and device for simulating network circuit in parallel distribution environment
JP4590857B2 (en) * 2003-11-17 2010-12-01 ソニー株式会社 Visualization method, visualization apparatus, and information storage medium related to biological material information
JP2007052766A (en) * 2005-07-22 2007-03-01 Mathematical Systems Inc Pathway display method, information processing device, and pathway display program
JP4555256B2 (en) * 2006-05-24 2010-09-29 Necソフト株式会社 Analysis method aiming at feature extraction and comparative classification of time-series gene expression data, and analysis apparatus based on the analysis method
KR100964181B1 (en) * 2007-03-21 2010-06-17 한국전자통신연구원 Clustering method of gene expressed profile using Gene Ontology and apparatus thereof
JP2010157214A (en) 2008-12-02 2010-07-15 Sony Corp Gene clustering program, gene clustering method, and gene cluster analyzing device
JP6278517B2 (en) * 2014-07-22 2018-02-14 Kddi株式会社 Data analysis apparatus and program
JP2017024633A (en) 2015-07-24 2017-02-02 三菱マヒンドラ農機株式会社 Footrest setting structure of work vehicle

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004048532A2 (en) * 2002-11-25 2004-06-10 Gni Usa Inferring gene regulatory networks from time-ordered gene expression data using differential equations
CN103476337A (en) * 2011-04-15 2013-12-25 株式会社日立医疗器械 Biophotonic measurement device, biophotonic measurement device operating method, and biophotonic measurement data analysis and display method
CN105009130A (en) * 2012-10-23 2015-10-28 独立行政法人科学技术振兴机构 Detection device, detection method and detection program which support detection of sign of state transition in living organism on basis of network entropy

Also Published As

Publication number Publication date
US20190362811A1 (en) 2019-11-28
EP3584727A4 (en) 2020-03-04
WO2018150878A1 (en) 2018-08-23
JP6851401B2 (en) 2021-03-31
EP3584727A1 (en) 2019-12-25
JPWO2018150878A1 (en) 2020-01-23
CN110291589A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
Borges et al. Measuring phylogenetic signal between categorical traits and phylogenies
Vyshemirsky et al. Bayesian ranking of biochemical system models
Werhli et al. Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks
Kolaczkowski et al. Long-branch attraction bias and inconsistency in Bayesian phylogenetics
Titman Flexible nonhomogeneous Markov models for panel observed data
Bulashevska et al. Inferring genetic regulatory logic from expression data
Morrissey et al. On reverse engineering of gene interaction networks using time course data with repeated measurements
Fang et al. Bayesian integrative model for multi-omics data with missingness
Montazeri et al. Large-scale inference of conjunctive Bayesian networks
Zhou et al. How do tumor cytogenetics inform cancer treatments? dynamic risk stratification and precision medicine using multi-armed bandits
Yao et al. An ensemble method for interval-censored time-to-event data
Delucchi et al. Bayesian network analysis reveals the interplay of intracranial aneurysm rupture risk factors
Bowles et al. Scalable inference of transcriptional kinetic parameters from MS2 time series data
Rimella et al. Inference on extended-spectrum beta-lactamase Escherichia coli and Klebsiella pneumoniae data through SMC2
JP2016024655A (en) Data analyzer and program
Nguyen et al. Semi-supervised network inference using simulated gene expression dynamics
CN110291589B (en) Biological substance analysis method and apparatus, and computer-readable storage medium
Grzegorczyk et al. Overview and evaluation of recent methods for statistical inference of gene regulatory networks from time series data
Yao et al. An instantaneous coalescent method insensitive to population structure
TW202324151A (en) Computer-implemented method and apparatus for analysing genetic data
Eaton et al. Nonparametric estimation in an illness-death model with component-wise censoring
Li et al. Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets
Gan et al. Inferring gene regulatory network from single-cell transcriptomic data by integrating multiple prior networks
Zerenner et al. Probabilistic predictions of SIS epidemics on networks based on population-level observations
Solevåg Inferring Gene Expression Values In Causal Directed Acyclic Graphs Using Graph Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant